You must login before you can run this tool.
Category
Published on
Abstract
This tool contains a detailed Jupyter notebook that goes through the various stages of training machine learning (ML) models based on computational materials science datasets. The problem we tackle here is the machine learned prediction of defect and impurity behavior in semiconductors, specifically Cd-chalcogenides. Using data generated from density functional theory (DFT) calculations on representative systems, various types of materials descriptors, and regression algorithms such as random forests and Kernel ridge regression, we train ML models that can accurately predict defect energetics for thousands of new systems and enable high throughput screening and design. In this notebook, we demonstrate how the DFT data and descriptors are read, how the data is divided into training and test sets, how standard practices of cross-validation and hyperparameter optimization are applied using the Python package Scikit-learn to train ML models, how prediction performances are visualized, how uncertainties in prediction are estimated and finally, how the optimized models are deployed. Interested researchers can play around with the code and tune various parameters, input their own data, and visualize various dimensions from the available properties and descriptors.
Powered by
Python, Scikit-learn
Bio
Arun Mannodi Kanakkithodi is a computational materials scientist working as a postdoctoral researcher at Argonne National Lab.
Publications
A. Mannodi-Kanakkithodi et al., npj Computational Materials volume 6, Article number: 39 (2020).
Cite this work
Researchers should cite this work as follows: