In early 2021, I gave a talk at the MIDD+ Conference held by Simulations Plus Inc. on data curation using one of the projects that I worked on — the Madin-Darby Canine Kidney (MDCK) project. In this blog post, I will be focusing on the general data curation aspects of that project. Let me emphasize…
Nested Cross-Validation & Cross-Validation Series – Part 3
This is part 3 of the Nested Cross-Validation & Cross-Validation Series where I will explain the algorithm of nested cross-validation (NeCV), and compare Cross-Validation and NeCV. Please read this blog first if you need to learn about cross-validation so that you can dive into NeCV after. I would like to first clarify that there are…
Nested Cross-Validation & Cross-Validation Series – Part 2B
Please check out the previous blog posts from this series if you haven’t done so already: Part 1 algorithm for k-fold Cross-Validation Part 2A of the Nested Cross-Validation & Cross-Validation Series where I went through a python tutorial on implementing k-fold CV regressors using random forest (RF) from scikit-learn with a simple cheminformatics dataset with descriptors…
Nested Cross-Validation & Cross-Validation Series – Part 2A
This is part 2A of the Nested Cross-Validation & Cross-Validation Series. I will go through a python tutorial on implementing k-fold CV regressors using random forest (RF) from scikit-learn with the first dataset: (A) a simple cheminformatics dataset with descriptors and endpoints of interest. In Part 2B, I will cover the same python tutorial for…
Nested Cross-Validation & Cross-Validation Series – Part 1
A few people have asked me to explain and share the code for Nested Cross-Validation. I think it makes sense for me to explain the basics of whats and whys in using the NeCV first before diving into the code, so I will be covering these topics in four separate blog posts. For part 1,…
Circular Dendrogram – Categorical Classification
Today’s tutorial is on applying unsupervised hierarchical clustering in R and generating circular dendrograms with nodes colored based on discrete categories, like in the figure shown below (Figure 1). Disclaimer: The above figure is generated with fake chemical data taken from different projects already published from my PhD years. I used R 4.0.2 and R…
Mordred_MRC_Descriptors in Python – Part 5
This is the last of the five-part series tutorial of the blog post, Computing Molecular Descriptors – Intro, in the context of drug discovery. The goal of this post to explain the python code on creating new descriptors such as MRC (developed in MacrolactoneDB study) and using Mordred descriptors. What are MRC descriptors? MRC descriptors were…
RDKit_2D Descriptors in Python – Part 4
This is part-4 from the five-part series tutorial of the blog post, Computing Molecular Descriptors – Intro, in the context of drug discovery. The goal of this post to explain the python code on computing 2D RDKit descriptors and exporting them as CSV files. First, install the required library packages using miniconda. The code for RDKit_2D class that…
ECFP6 Fingerprints in Python – Part 3
This is part-3 from the five-part series tutorial of the blog post, Computing Molecular Descriptors – Intro, in the context of drug discovery. The goal of this post to explain the python code on computing Morgan ECFP fingerprints also known as ECFP6 (radius = 3) connectivity fingerprints. What Are ECFP Fingerprints? Please read this article and…
MACCS Fingerprints in Python – Part 2
This is from the five-part series tutorial of the previous blog post, Computing Molecular Descriptors – Intro in the context of drug discovery. The goal of this post to explain the python code on computing MACCS fingerprints. Please read this blog to familiarize yourself with MACCS. The 166 public keys (fragment definitions) of MACCS in…