Autoencoder for Continuous Representation of Discrete Chemical Data

Download (PDF)

Licensed according to this deed.

Published on


Fuel cells are a promising alternative energy source that are capable of efficiently producing energy by reacting hydrogen and oxygen to generate clean water vapor as the only byproduct. However, one of the major bottlenecks in fuel cell research is the limited operating range, environmentally hazardous synthesis, and high cost of the membrane material. However, the investigation of alternative membrane chemistries has been slow due to the cost associated with synthesizing new polymers and only a small number of alternatives have been studied. Here we address this challenge by utilizing physics-based quantum chemistry and machine learning to accelerate the discovery of novel membrane materials. We trained an autoencoder, which is a deep learning architecture, to reversibly convert discrete molecular structures into a continuous vector representation that is amenable to machine learning. The training of this autoencoder is coupled with a predictor that estimates chemical properties, including pKa, from this vector space. This chemical autoencoder combined with computational chemistry methods allows us to implement searching and optimization procedures to discover promising membrane material candidates.

Cite this work

Researchers should cite this work as follows:

  • Mariana Rodriguez, Nicolae C Iovanac, Brett Matthew Savoie (2018), "Autoencoder for Continuous Representation of Discrete Chemical Data,"

    BibTex | EndNote