nanoHUB IGNITE 2021
Machine Learning Challenge
Instructions
Reducing the time and cost associated with the discovery of new materials with unprecedented properties are expected to have significant societal impact. New materials are needed in the fields of energy, transportation, aerospace, and medicine, among others. This challenge will use machine learning tools to reduce the number of experiments to achieve a design goal.
This challenge involves the use of active learning in the context of materials discovery online simulations. Active learning is a subset of machine learning where the information available at a given time is used to determine the next experiment to carry out in order to achieve a design goal. This challenge involves finding the alloy with the highest hardness in the lowest number of experiments.
Students will use a database of properties of a new class of metallic alloys, high entropy alloys, and will be tasked with designing an optimal active learning workflow. The challenge is to, starting from a common set of known materials, find the hardest alloy within the fewest number of experiments.
Background
Active learning is a subset of machine learning where models learn dynamically by iteratively analyzing existing data, identifying the next experiment expected to maximize the information gained, and querying the selected information source. This workflow is depicted in Figure 1.
Figure 1. Schematic representation of the active learning iterative loop.
These algorithms start with models trained with an initial set of data and are used to evaluate the expected gain towards an objective function of all possible new experiments within a design space. The top candidate or candidates are then characterized (e.g. by performing an experiment) and the outcome added to the existing data set and the model re-trained. With this iterative process, illustrated in Fig. 1, the model becomes more accurate in regions of the design space of interest. Selection strategies used in active learning to identify the next query are known as information acquisition functions; they differ from each other in the relative balance between exploitation and exploration. Exploitation functions favor cases expected to maximize the objective function. Exploration functions, on the other hand, tend to explore areas of high uncertainties. Figure 2 shows how various acquisition functions perform in finding a material with the highest possible ionic conductivity out of a set of existing experimental results.
Figure 2. Comparison of different Information Acquisition Functions. Black dots indicate the starting initial set. Gray dots represent unknown but available points for queries. Colored dots indicate the points explored by the active learning approach. The first plot is a summary in which we track the highest value of Ionic Conductivity queried by the functions against the number of experiments.
More details about active learning in the context of materials science, including hands-on simulations in nanoHUB, can be found in the following learning module:
https://nanohub.org/resources/34272
Students are strongly recommended to study this material. Additional information about machine learning in the context of materials is available at:
https://nanohub.org/groups/mlmodules
Students should explore different initial sets of data, and assess how the acquisition functions perform given this discrepancy.
Users will be able to run interactive code online using nanoHUB, no need to download or install any software.
Provided for Students
-
Background material provided above.
-
A database of properties of high entropy alloys to exercise the active learning workflow
-
A Jupyter notebook with all the code necessary to get started
-
The nanoHUB Jupyter environment with all the necessary libraries
Student Tasks
-
Students will execute the provided Jupyter notebook with active learning to create a baseline performance.
-
Modify code to generate independent sets of initial experiments (10 in total) calculate the distributions of the number of experiments required to find the alloy with the highest hardness for various acquisition functions.
-
Modify the acquisition functions and find the one that minimizes the number of experiments to find the best material averaged over 20 independent random initial sets.
-
Document results in a short report detailing how various acquisition functions affect the average number of experiments required
-
Submit the short report and the Jupyter notebook
Learning objectives. After completing this module, you will
-
Be able to use and modify active learning workflows
-
Evaluate different information acquisition functions
-
Use active learning to reduce the number of experiments in materials discovery or design
Pre-requisites
-
Basic Python programming is required
-
College-level math
-
Materials science background or interest if desired
nanoHUB Tools to use:
Everything you need for the challenge is available through the following tool:
Additional resources: