Phyo Phyo Kyaw Zin

How to Box Plot with Python

January 23, 2022January 23, 2022ZinCheminformatics

This blog post is for readers as well as myself. In this tutorial, I will show how to make different types of boxplots including horizontal, vertical, grouped boxplots, and interactive ones. It’s not meant to be comprehensive. It’s just a collection of different styles and visualizations that I like. For the code, you will need…

How To Curate Chemical Data for Cheminformatics

January 11, 2022March 22, 2023ZinCheminformatics

In early 2021, I gave a talk at the MIDD+ Conference held by Simulations Plus Inc. on data curation using one of the projects that I worked on — the Madin-Darby Canine Kidney (MDCK) project. In this blog post, I will be focusing on the general data curation aspects of that project. Let me emphasize…

Nested Cross-Validation & Cross-Validation Series – Part 3

July 19, 2021ZinCheminformatics

This is part 3 of the Nested Cross-Validation & Cross-Validation Series where I will explain the algorithm of nested cross-validation (NeCV), and compare Cross-Validation and NeCV. Please read this blog first if you need to learn about cross-validation so that you can dive into NeCV after. I would like to first clarify that there are…

Nested Cross-Validation & Cross-Validation Series – Part 2B

January 31, 2021February 2, 2021ZinCheminformatics

Please check out the previous blog posts from this series if you haven’t done so already: Part 1 algorithm for k-fold Cross-Validation Part 2A of the Nested Cross-Validation & Cross-Validation Series where I went through a python tutorial on implementing k-fold CV regressors using random forest (RF) from scikit-learn with a simple cheminformatics dataset with descriptors…

Nested Cross-Validation & Cross-Validation Series – Part 2A

December 16, 2020December 16, 2020ZinCheminformatics

This is part 2A of the Nested Cross-Validation & Cross-Validation Series. I will go through a python tutorial on implementing k-fold CV regressors using random forest (RF) from scikit-learn with the first dataset: (A) a simple cheminformatics dataset with descriptors and endpoints of interest. In Part 2B, I will cover the same python tutorial for…

Nested Cross-Validation & Cross-Validation Series – Part 1

November 16, 2020November 18, 2020ZinCheminformatics

A few people have asked me to explain and share the code for Nested Cross-Validation. I think it makes sense for me to explain the basics of whats and whys in using the NeCV first before diving into the code, so I will be covering these topics in four separate blog posts. For part 1,…

Circular Dendrogram – Categorical Classification

July 13, 2020November 18, 2020ZinCheminformatics

Today’s tutorial is on applying unsupervised hierarchical clustering in R and generating circular dendrograms with nodes colored based on discrete categories, like in the figure shown below (Figure 1). Disclaimer: The above figure is generated with fake chemical data taken from different projects already published from my PhD years. I used R 4.0.2 and R…

Mordred_MRC_Descriptors in Python – Part 5

June 14, 2020November 18, 2020ZinCheminformatics

This is the last of the five-part series tutorial of the blog post, Computing Molecular Descriptors – Intro, in the context of drug discovery. The goal of this post to explain the python code on creating new descriptors such as MRC (developed in MacrolactoneDB study) and using Mordred descriptors. What are MRC descriptors? MRC descriptors were…

RDKit_2D Descriptors in Python – Part 4

June 4, 2020June 8, 2020ZinCheminformatics

This is part-4 from the five-part series tutorial of the blog post, Computing Molecular Descriptors – Intro, in the context of drug discovery. The goal of this post to explain the python code on computing 2D RDKit descriptors and exporting them as CSV files. First, install the required library packages using miniconda. The code for RDKit_2D class that…

ECFP6 Fingerprints in Python – Part 3

June 1, 2020June 8, 2020ZinCheminformatics

This is part-3 from the five-part series tutorial of the blog post, Computing Molecular Descriptors – Intro, in the context of drug discovery. The goal of this post to explain the python code on computing Morgan ECFP fingerprints also known as ECFP6 (radius = 3) connectivity fingerprints. What Are ECFP Fingerprints? Please read this article and…