Building machine learning models for ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity) predictions puts you on the front lines of drug discovery. It’s exciting work—pushing the boundaries of what’s possible, using algorithms to predict how molecules behave in the body. But the part they don’t tell you? The real battle isn’t in designing models or…
All posts in Cheminformatics
How to Build Virtual Chemical Libraries with Fragment Analogues: ChemX
Introduction I want to introduce you to ChemX, a Python-based program I developed during a hackathon in 2019. You can use it to build virtual chemical libraries using fragment analogues to the building blocks of the target molecule. Using the RDKit library, ChemX assembles chemically similar fragments to create a virtual chemical library. What ChemX…
How to Scrape FDA Drug Approval Data with Python
Personal Update: Before we embark on today’s tutorial, I wanted to share a personal update with you that sheds light on my recent hiatus from blogging. In the past few months, I’ve been immersed in the world of motherhood, cherishing precious moments with my newborn, Ellie. As I transition back to my role in the…
How to Highlight Molecular Substructures: Celebrating Commonalities and Differences
Introduction Today, we will dive into molecular substructure highlighting with RDKit – a powerful technique that illuminates the hidden intricacies within molecular compounds. In this tutorial, I will be focusing on two things: If you are interested in more Cheminformatics related tutorials, check my other blog posts here. Section 1: Understanding the Power of Structure…
How to Plot Bar Charts with Chemical Structures
In this tutorial, I will show how to generate bar chats with chemical structures using python and rdkit. I am adopting the code from Andres Berejnoi’s code repository. His code works with any image as long as you can represent the image as numpy array. For the code, you will need the following python libraries: pandas,…
How to Do Reaction-Based Molecular Transforms Using RDKit and Python
In this tutorial, I will show how to generate reaction-based molecules with python code tutorials. I previously developed PKS Enumerator and SIME software tools which are used to design virtual libraries of macrocycles/macrolides. Both were based on a string- or template-based enumeration, and I will write a tutorial on how to do that in a…
How to Merge Multiple Datasets with Pandas and Python – Part 1
Today’s tutorial is on how to merge multiple datasets using the Pandas library in python. We will add new columns based on a key column, and we will also aggregate information for the same column names from various datasets. I have made five sample datasets (A1.csv, A2.csv, A3.csv, A4.csv, A5.csv) that we will be merging.…
How to Box Plot with Python
This blog post is for readers as well as myself. In this tutorial, I will show how to make different types of boxplots including horizontal, vertical, grouped boxplots, and interactive ones. It’s not meant to be comprehensive. It’s just a collection of different styles and visualizations that I like. For the code, you will need…
How To Curate Chemical Data for Cheminformatics
In early 2021, I gave a talk at the MIDD+ Conference held by Simulations Plus Inc. on data curation using one of the projects that I worked on — the Madin-Darby Canine Kidney (MDCK) project. In this blog post, I will be focusing on the general data curation aspects of that project. Let me emphasize…
Nested Cross-Validation & Cross-Validation Series – Part 3
This is part 3 of the Nested Cross-Validation & Cross-Validation Series where I will explain the algorithm of nested cross-validation (NeCV), and compare Cross-Validation and NeCV. Please read this blog first if you need to learn about cross-validation so that you can dive into NeCV after. I would like to first clarify that there are…