Cheminformatics is also called chemoinformatics, is an interdisciplinary field that combines chemistry, computer science, and information technology to solve problems related to the storage, retrieval, and analysis of chemical data. It involves the use of computational methods and tools to study the properties and behavior of chemical compounds, as well as to design new molecules with desired properties.
Cheminformatics has become increasingly important in drug discovery, as it allows researchers to sift through vast amounts of chemical data to identify potential drug candidates. It is also used in fields such as materials science, environmental science, and agriculture, where the design of new compounds with specific properties is of great importance.
For this purpose, it is necessary to handle the structure and properties of compounds with a computer, but since the recognition method of compounds is different between humans and computers, some software will be used.
If you are new to the field of cheminformatics, you may find it challenging to navigate the various concepts and tools involved. However, with a basic understanding of chemistry and computer science, you can begin to explore the exciting world of cheminformatics and its many applications.
RDKit is a powerful open-source toolkit that brings the world of chemistry to life through software. With its cutting-edge cheminformatics capabilities, RDKit allows scientists, researchers, and developers to explore the fascinating world of molecular structures with ease. It is written in C++ and Python, and is widely used in the pharmaceutical industry, academic research, and other fields that deal with chemical compounds.
Whether you're a chemist looking to conduct complex analyses or a developer interested in building chemical applications, RDKit is the perfect toolkit to help you explore the building blocks of life.
In your notebook,
Open Anaconda Prompt then >>
To obtain data on molecules using RDKit, we must initially establish variables that represent the molecules. RDKit has molecule object that can be used to retrieve information or calculate properties. Initially, the name of the molecule needs to be conveyed to RDKit in a format that computers can understand.
We are going to use a part of RDKit called Chem. To use Chem, we first have to import it. Then, We can create a representation of methane using RDKit by using the MolFromSmiles function in rdkit.Chem.
To load a single molecule: Chem.MolFromXXX(Smiles/Smarts)
When writing a single molecule: ChemMolToXXX(MolBlock)
Reading and Vizualizing Molecules from different formats
Adding and Removing Hydrogens
Like the previous example, the molecule does not have hydrogens by default. If you need to work with the hydrogens, you can use a function to add them and remove them again :"D
Using GetAtoms() function to look at atoms and their Masses
C 12.011
N 14.007
C 12.011
N 14.007
C 12.011
C 12.011
C 12.011
O 15.999
N 14.007
C 12.011
O 15.999
N 14.007
C 12.011
C 12.011
Number of atoms in molecule
RDKit can tell us the number of atoms in our molecule.
If you want to count hydrogens, add onlyExplicit=False to GetNumAtoms function.
14
Similarly, using GetBonds method to get the bonds in a molecule.
SINGLE
AROMATIC
AROMATIC
AROMATIC
AROMATIC
AROMATIC
DOUBLE
AROMATIC
AROMATIC
DOUBLE
AROMATIC
SINGLE
SINGLE
AROMATIC
AROMATIC
0D Molecular Descriptors
Molecules can be described in a data table by presence or absence or total number of atoms present. The total number of carbon, nitrogen, oxygen or halogen atoms can potentially adequately describe a molecule. However, this type of representation, known as 0D, is limited in conveying information about the molecular structure and atom connectivity. Examples of 0D descriptors include atom and bond counts, molecular weight, and molar refractivity.
1D Molecular Descriptors
Descriptors in one-dimensional (1D) representation typically include fragment counts, the number of sp3, sp2, or sp hybridized carbons present. , hydrogen bond donors and acceptors, Polar Surface Area (PSA), and similar features. These descriptors are often binary values that indicate the presence or absence of specific substructures or their frequencies of occurrence.
02D, 03D Molecular Descriptors will be discussed later in advanced topics.
To get the descritpor you’re interested in, the syntax is Descriptors.DescriptorName(molecule_variable)
Molecular descriptors are properties of molecules which can be analyzed using statistical methods to make predictions about molecular properties.
To get molecular descriptors from RDKit, we import the Descriptors module.
You can see a full list of RDKit descriptors
Here are some calculated descriptors:
194.194
74
194.08037556
184.11399999999998
-1.0293
To calculate the full set of descriptors for a molecule, use the following: