The obabel command line program converts chemical objects (currently
molecules or reactions) from one file format to another. The Open Babel
graphical user interface (GUI) is an alternative to using the command
line and has the same capabilities. Since Open Babel 2.3, the GUI is
available cross-platform on Windows, Linux and MacOSX. On Windows, you
can find it in the Start Menu in the Open Babel folder; on Linux and
MacOSX, the GUI can be started with the obgui command.
Since the functionality of the GUI mirrors that of obabel, you should
consult the previous chapter to learn about available features and how
to use them. This chapter describes the general use of the GUI and then
focuses on features that are specific to the GUI.
Basic operation
Although the GUI presents many options, the basic operation is straightforward:
Select the type of the type of the input file from the dropdown list.
Click the “...” button and select the file. Its contents are displayed in the textbox below.
Choose
the output format and file in a similar way. You can merely display the
output without saving it by not selecting an output file or by checking
“Output below only..”.
Click the “Convert” button.
The message window below the button gives the number of molecules converted, and the contents of the output file are displayed.
By default, all the molecules in an input file are converted if the output format allows multiple molecules.
Options
The options in the middle are those appropriate for the type
of chemical object being converted (molecule or reaction) and the input
and output formats. They are derived from the description text that is
displayed with the -Hxxx option in the command line interface and with
the “Format info” buttons here. You can switch off the display of any of
the various types of option using the View menu if the screen is
getting too cluttered.
Multiple input files
You can select
multiple input files in the input file dialog in the normal way (for
example, using the Control key in Windows). In the input filename box,
each filename is displayed relative to the path shown just above the
box, which is the path of the first file. You can display any of the
files by moving the highlight with Tab/Shift Tab, Page Up/Down, the
mouse wheel, or by double clicking.
Selecting one or more new file
names normally removes those already present, but they can instead be
appended by holding the Control key down when leaving the file selection
dialog.
Files can be also be dragged and dropped (e.g. from Windows
Explorer), adding the file when the Control key is pressed, replacing
the existing files when it is not.
Normally each file is converted
according to its extension and the input files do not have to be all the
same, but if you want to use non-standard file names set the checkbox
“Use this format for all input files...“
If you want to combine
multiple molecules (from one or more files) into a single molecule with
disconnected parts, use option “Join all input molecules...“
Wildcards in filenames
When
input filenames are typed in directly, any of them can contain the
wildcard characters * and ?. Typing Enter will replace these by a list
of the matching files. The wildcarded names can be restored by typing
Enter while holding down the Shift key. The original or the expanded
versions will behave the same when the “Convert” button is pressed.
By
including the wildcard * in both the input and output filenames you can
carry out batch conversion. Suppose there were files first.smi,
second.smi, third.smi. Using*.smi as the input filename and *.mol as the
output filename would produce three files first.mol, second.mol and
third.mol. If the output filename was NEW_*.mol, then the output files
would be NEW_first.mol, etc.
Local input
By checking the
“Input below...” checkbox you can type the input text directly. The text
box changes colour to remind you that it is this text and not the
contents of any files that will be converted.
Output file
The
output file name can be fully specified with a path, but if it is not,
then it is considered to be relative to the input file path.
Graphical display
The
chemical structures being converted can be displayed (as SVG) in an
external program. By default this is Firefox but it can be changed from
an item on the Viewmenu (for instance, Opera and Chrome work fine). When
“Display in firefox” (under the output file name) is checked, the
structures will be shown in a new Firefox tab. With multiple molecules
the display can be zoomed (mousewheel) and panned (dragging with mouse
button depressed). Up to 100 molecules are easily handled but with more
the system may be slow to manipulate. It may also be slow to generate,
especially if 2D atom coordinates have to be calculated (e.g.from
SMILES). A new Firefox tab is opened each time Convert is pressed.
Using a restricted set of formats
It
is likely that you will only be interested in a subset of the large
range of formats handled by Open Babel. You can restrict the choice
offered in the dropdown boxes, which makes routine selection easier.
Clicking “Select set of formats” on the Viewmenu allows the formats to
be displayed to be selected. Subsequently, clicking “Use restricted set
of formats” on the View menu toggles this facility on and off.
Using a
restricted set overcomes an irritating bug in the Windows version. In
the fileOpen and Save dialogs the files displayed can be filtered by the
current format, All Chemical Formats, or All Files. The All Chemical
Formats filter will only display the first 30 possible formats
(alphabetically). The All Files will indeed display all files and the
conversion processes are unaffected.
Other features
Most of the interface parameters, such as the selected format and the window size and position, are remembered between sessions.
Using
the View menu, the input and output text boxes can be set not to wrap
the text. At present you have to restart the program for this to take
effect.
The message box at the top of the output text window receives
program output on error and audit logging, and some progress reports.
It can be expanded by dragging down the divider between the windows.
Example files
In the Windows distribution, there are three chemical files included to try out:
- serotonin.mol which has 3D atom coordinates
- oxamide.cml which is 2D and has a large number of properties that will be seen when converting to SDF
FourSmallMols.cml
which (unsurprisingly) contains four molecules with no atom coordinates
and can be used to illustrate the handling of multiple molecules:
Setting
the output format to SMI (which is easy to see), you can convert only
the second and third molecules by entering 2 and 3 in the appropriate
option boxes. Or convert only molecules with C-O single bonds by
entering CO in the SMARTS option box.
Tutorial selengkapnya dapat dilihat disini
Filtering Structure
Setup
We are going to use a dataset of 16 benzodiazepines. These all share the following substructure (image from Wikipedia):
Create a folder on the Desktop called Work and save benzodiazepines.sdf there
Set up a conversion from SDF to SMI and set benzodiazepines.sdf as the input file
Tick Display in Firefox
Click CONVERT
Remove duplicates
If
you look carefully at the depictions of the first and last molecules
(top left and bottom right) you will notice that they depict the same
molecule.
Look at the SMILES strings for the first and last
molecules. If the two molecules are actually the same, why are the two
SMILES strings different? (Hint: try using CAN - canonical SMILES
instead of SMI.)
We can remove duplicates based on the InChI (for example):
Tick the box beside remove duplicates by descriptor and enter inchi as the descriptor
Click CONVERT
Duplicates
can be removed based on any of the available descriptors. The full list
can be found in the menu under Plugins, descriptors.
Are any of the other descriptors useful for removing duplicates?
Filtering by substructure
How many of the molecules contain the following substructure?
The SMILES string for this molecule is c1ccccc1F. This is also a valid SMARTS string.
Use the SMARTSviewer at the ZBH Center for Bioinformatics, University
of Hamburg, to verify the meaning of the SMARTS string c1ccccc1F.
Removing potentially toxic molecules
Filtering
a dataset of molecules by substructure is particularly useful if you
need to remove molecules with problematic functional groups. For
example, particular functional groups are associated with toxicological
problems.
Let’s filter the molecules using this substructure:
In the Options section, enter c1ccccc1F into the box labeled Convert only if match SMARTS or mols in file
Click CONVERT.
How many structures are matched?
Now find all those that are not matched by preceding the SMARTS filter with a tilde ~, i.e. ~c1ccccc1F.
Click CONVERT.
How many structures are not matched?
Filter by descriptor
Screenshot
As discussed above, Open Babel provides several descriptors. Here we will focus on the molecular weight, MW.
To begin with, let’s show the molecular weights in the depiction:
Clear the existing title by entering a single space into the box Add or replace molecule title
Set the title to the molecular weight by entering MW into the box Append properties or descriptors in list to title
Click CONVERT
You
should see the molecular weight below each molecule in the depiction.
Notice also that the SMILES output has the molecular weight beside each
molecule. This could be useful for preparing a spreadsheet with the
SMILES string and various calculated properties.
Now let’s sort by molecular weight:
Enter MW into the box Sort by descriptor and click CONVERT
Finally,
here’s how to filter based on molecular weight. Note that none of the
preceding steps are necessary for the filter to work. We will convert
all those molecules with molecular weights between 300 and 320 (in the
following expression & signifies Boolean AND):
Enter MW>300 & MW<320 into the box Filter convert only when tests are true and click CONVERT
Filter by property
The SDF format, in common with some other file
formats, allows property fields for each molecule. Open Babel allows the
user to filter using these, add the value to the title, remove or
replace.