Join for free and connect with our local tech scene

Stay on top of the latest companies and upcoming events with our weekly newsletter, and be counted among the people building the future of your local tech community.

Tech Valley, NY /

Interpretable machine learning for molecular design and discovery

estuari 333 Broadway , Troy, NY 12180 (map)

Target audience  - only knowledge of basic machine learning concepts assumed (ie, feature vectors, models, train / test splitting) 


Custom molecules with specific properties are essential to the the development of many important products such as drugs, batteries, and organic LEDs. Since the synthesis of molecules is difficult, is is important to be able to predict molecular properties for candidate molecules before attempting synthesis. The prediction of molecular properties has traditionally required complex and computationally expensive quantum mechanical simulations, greatly limiting the speed at which useful molecules can be discovered. 

Furthermore, chemical compound space, which refers to the space of stable molecules, is immensely vast -- the number of stable organic molecules of drug-like size with “normal” atoms and up to four rings has been estimated to be around 10^60.  I will discuss several different featurization strategies that transform molecules into vectors and therefore allow for molecular property prediction and rapid exploration of chemical compound space using machine learning : fingerprinting, the Coulomb matrix, bag of bonds, sum over bonds, descriptors, and the molecular autoencoder.   

The use of data driven statistical inference and regression to predict molecular properties has been done for decades and is a well established field known as quantitative structure-property relationships, or QSPR.  In the past few years, QSPR has benefited greatly from with the development of new machine learning techniques. There is growing interest if physical insights relevant for molecular design can be extracted from machine learning. I will explore this question by comparing different interpretation schemes that have been proposed and see if they are consistent. 


Dan Elton is a postdoc at the University of Maryland, College park, working with Prof. Peter W. Chung and Prof. Mark Fuge. He founded the Tech Valley Machine Learning meetup in November 2016. His website is

We thank Matthew Lean and OneHudson Ventures for letting us use the Estuari coworking space. 

Submitted by

Dan-thanksgiving-2016-1_cropped_square_smaller Dan Elton


Sign in to comment.