Building machine learning models for ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity) predictions puts you on the front lines of drug discovery. It’s exciting work—pushing the boundaries of what’s possible, using algorithms to predict how molecules behave in the body. But the part they don’t tell you? The real battle isn’t in designing models or…
All posts tagged cheminformatics
How to Build Virtual Chemical Libraries with Fragment Analogues: ChemX
Introduction I want to introduce you to ChemX, a Python-based program I developed during a hackathon in 2019. You can use it to build virtual chemical libraries using fragment analogues to the building blocks of the target molecule. Using the RDKit library, ChemX assembles chemically similar fragments to create a virtual chemical library. What ChemX…
How to Highlight Molecular Substructures: Celebrating Commonalities and Differences
Introduction Today, we will dive into molecular substructure highlighting with RDKit – a powerful technique that illuminates the hidden intricacies within molecular compounds. In this tutorial, I will be focusing on two things: If you are interested in more Cheminformatics related tutorials, check my other blog posts here. Section 1: Understanding the Power of Structure…
How to Plot Bar Charts with Chemical Structures
In this tutorial, I will show how to generate bar chats with chemical structures using python and rdkit. I am adopting the code from Andres Berejnoi’s code repository. His code works with any image as long as you can represent the image as numpy array. For the code, you will need the following python libraries: pandas,…
How to Do Reaction-Based Molecular Transforms Using RDKit and Python
In this tutorial, I will show how to generate reaction-based molecules with python code tutorials. I previously developed PKS Enumerator and SIME software tools which are used to design virtual libraries of macrocycles/macrolides. Both were based on a string- or template-based enumeration, and I will write a tutorial on how to do that in a…
Nested Cross-Validation & Cross-Validation Series – Part 2A
This is part 2A of the Nested Cross-Validation & Cross-Validation Series. I will go through a python tutorial on implementing k-fold CV regressors using random forest (RF) from scikit-learn with the first dataset: (A) a simple cheminformatics dataset with descriptors and endpoints of interest. In Part 2B, I will cover the same python tutorial for…
Nested Cross-Validation & Cross-Validation Series – Part 1
A few people have asked me to explain and share the code for Nested Cross-Validation. I think it makes sense for me to explain the basics of whats and whys in using the NeCV first before diving into the code, so I will be covering these topics in four separate blog posts. For part 1,…
Circular Dendrogram – Categorical Classification
Today’s tutorial is on applying unsupervised hierarchical clustering in R and generating circular dendrograms with nodes colored based on discrete categories, like in the figure shown below (Figure 1). Disclaimer: The above figure is generated with fake chemical data taken from different projects already published from my PhD years. I used R 4.0.2 and R…
Mordred_MRC_Descriptors in Python – Part 5
This is the last of the five-part series tutorial of the blog post, Computing Molecular Descriptors – Intro, in the context of drug discovery. The goal of this post to explain the python code on creating new descriptors such as MRC (developed in MacrolactoneDB study) and using Mordred descriptors. What are MRC descriptors? MRC descriptors were…
RDKit_2D Descriptors in Python – Part 4
This is part-4 from the five-part series tutorial of the blog post, Computing Molecular Descriptors – Intro, in the context of drug discovery. The goal of this post to explain the python code on computing 2D RDKit descriptors and exporting them as CSV files. First, install the required library packages using miniconda. The code for RDKit_2D class that…