Taming the Chaos: Cleaning Data for Reliable ADMET Models

Taming the Chaos: Cleaning Data for Reliable ADMET Models

Building machine learning models for ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity) predictions puts you on the front lines of drug discovery. It’s exciting work—pushing the boundaries of what’s possible, using algorithms to predict how molecules behave in the body. But the part they don’t tell you? The real battle isn’t in designing models or…

How to Build Virtual Chemical Libraries with Fragment Analogues: ChemX

How to Build Virtual Chemical Libraries with Fragment Analogues: ChemX

Introduction I want to introduce you to ChemX, a Python-based program I developed during a hackathon in 2019. You can use it to build virtual chemical libraries using fragment analogues to the building blocks of the target molecule. Using the RDKit library, ChemX assembles chemically similar fragments to create a virtual chemical library. What ChemX…

How to Highlight Molecular Substructures: Celebrating Commonalities and Differences

How to Highlight Molecular Substructures: Celebrating Commonalities and Differences

Introduction Today, we will dive into molecular substructure highlighting with RDKit – a powerful technique that illuminates the hidden intricacies within molecular compounds. In this tutorial, I will be focusing on two things: If you are interested in more Cheminformatics related tutorials, check my other blog posts here. Section 1: Understanding the Power of Structure…

Nested Cross-Validation & Cross-Validation Series – Part 2A

Nested Cross-Validation & Cross-Validation Series – Part 2A

This is part 2A of the Nested Cross-Validation & Cross-Validation Series. I will go through a python tutorial on implementing k-fold CV regressors using random forest (RF) from scikit-learn with the first dataset: (A) a simple cheminformatics dataset with descriptors and endpoints of interest. In Part 2B, I will cover the same python tutorial for…

RDKit_2D Descriptors in Python – Part 4

This is part-4 from the five-part series tutorial of the blog post, Computing Molecular Descriptors – Intro, in the context of drug discovery. The goal of this post to explain the python code on computing 2D RDKit descriptors and exporting them as CSV files. First, install the required library packages using miniconda. The code for RDKit_2D class that…