Nested Cross-Validation & Cross-Validation Series – Part 2B

Nested Cross-Validation & Cross-Validation Series – Part 2B

Please check out the previous blog posts from this series if you haven’t done so already: Part 1 algorithm for k-fold Cross-Validation Part 2A of the Nested Cross-Validation & Cross-Validation Series where I went through a python tutorial on implementing k-fold CV regressors using random forest (RF) from scikit-learn with a simple cheminformatics dataset with descriptors…

Nested Cross-Validation & Cross-Validation Series – Part 2A

Nested Cross-Validation & Cross-Validation Series – Part 2A

This is part 2A of the Nested Cross-Validation & Cross-Validation Series. I will go through a python tutorial on implementing k-fold CV regressors using random forest (RF) from scikit-learn with the first dataset: (A) a simple cheminformatics dataset with descriptors and endpoints of interest. In Part 2B, I will cover the same python tutorial for…

RDKit_2D Descriptors in Python – Part 4

This is part-4 from the five-part series tutorial of the blog post, Computing Molecular Descriptors – Intro, in the context of drug discovery. The goal of this post to explain the python code on computing 2D RDKit descriptors and exporting them as CSV files. First, install the required library packages using miniconda. The code for RDKit_2D class that…

ECFP6 Fingerprints in Python – Part 3

ECFP6 Fingerprints in Python – Part 3

This is part-3 from the five-part series tutorial of the blog post, Computing Molecular Descriptors – Intro, in the context of drug discovery. The goal of this post to explain the python code on computing Morgan ECFP fingerprints also known as ECFP6 (radius = 3) connectivity fingerprints. What Are ECFP Fingerprints? Please read this article and…

MACCS Fingerprints in Python – Part 2

This is from the five-part series tutorial of the previous blog post, Computing Molecular Descriptors – Intro in the context of drug discovery. The goal of this post to explain the python code on computing MACCS fingerprints. Please read this blog to familiarize yourself with MACCS. The 166 public keys (fragment definitions) of MACCS in…

Computing Molecular Descriptors – Part 1

Computing Molecular Descriptors – Part 1

I will write a five-part series tutorial on implementing the python code to compute different sets of 2D molecular descriptors & fingerprints which are highly used in the context of drug discovery. Many thanks to the first-year Ph.D. students who request me to write tutorials on cheminformatics topics such as these. I welcome readers to…

PCA Visualized with 3D Scatter Plots

PCA Visualized with 3D Scatter Plots

Today’s tutorial is on applying Principal Component Analysis (PCA, a popular feature extraction technique) on your chemical datasets and visualizing them in 3D scatter plots. Quick Introduction on PCA! The following short description gives a good idea of what PCA is if you aren’t familiar with it. Principal Component Analysis (PCA) is a linear dimensionality reduction technique…