This is from the five-part series tutorial of the previous blog post, Computing Molecular Descriptors – Intro in the context of drug discovery. The goal of this post to explain the python code on computing MACCS fingerprints. Please read this blog to familiarize yourself with MACCS. The 166 public keys (fragment definitions) of MACCS in…
All posts in May 2020
Computing Molecular Descriptors – Part 1
I will write a five-part series tutorial on implementing the python code to compute different sets of 2D molecular descriptors & fingerprints which are highly used in the context of drug discovery. Many thanks to the first-year Ph.D. students who request me to write tutorials on cheminformatics topics such as these. I welcome readers to…
PCA Visualized with 3D Scatter Plots
Today’s tutorial is on applying Principal Component Analysis (PCA, a popular feature extraction technique) on your chemical datasets and visualizing them in 3D scatter plots. Quick Introduction on PCA! The following short description gives a good idea of what PCA is if you aren’t familiar with it. Principal Component Analysis (PCA) is a linear dimensionality reduction technique…
11 Tips for a Great PhD Journey
I recently finished my PhD in Cheminformatics (from the Department of Chemistry) at North Carolina State University. I worked on my PhD for three and a half years; I started it in August 2016 and defended on March 24th 2020. During that time, I published five articles (three as first author), one article is currently…
How to Generate Chemical Space Visualizations with R & Gephi
Today, I want to write a tutorial on how to generate chemical space visualizations using a combination of R and Gephi. I have found them to be a powerful way of assessing the chemical data and finding hidden patterns that could be crucial in estimating the biological endpoints of interest. Before we go on, let…
How to Compile All Mol Files into SDF file
Are you tired of creating one molecule at a time using Marvin View? I know the struggle. My folder was overflowing with individual molecule files, and it was becoming a nightmare to efficiently dock them against my target protein. But fear not, because I have found a solution, with the help of a research member…
SIME: Synthetic Insight-based Macrolide Enumerator
Abstract We report on a new cheminformatics enumeration technology—SIME, synthetic insight-based macrolide enumerator—a new and improved software technology. SIME (freely available in github) can enumerate fully assembled macrolides with synthetic feasibility by utilizing the constitutional and structural knowledge extracted from biosynthetic aspects of macrolides. Taken into account by the software are key information such as…
PKS Enumerator
Abstract We report on the development of a cheminformatics enumeration technology and the analysis of a resulting large dataset of virtual macrolide scaffolds. Although macrolides have been shown to have valuable biological properties, there is no ready–to–screen virtual library of diverse macrolides in the public domain. Conducting molecular modeling (especially virtual screening) of these complex molecules is…
In Loving Memories of Grandma.
Originally published on MONDAY, OCTOBER 3, 2016 It has been a year since my grandma passed away. Losing her was one of the hardest things I had to do, and it took me several months to come to terms with that. I remember those drastic and depressing first few weeks after she passed. Fall 2015…
Goodbye, Grandma.
Originally published on TUESDAY, OCTOBER 20, 2015 It has been two days, but I keep staring at a single line of message. And, it still doesn’t make sense. The line was compact, yet powerful enough to shake my day. How did all this happen in merely half a day? You were just hospitalized when my…