## Data Science and Machine Learning

### Simulation Tools

### Citrine Tools for Materials Informatics

The Jupyter Notebooks in this tool implement methods developed by Citrine Informatics for materials design. Users can modify the notebooks to explore different models, try new ideas and adapt them for their own problems. Examples seek to solve materials design problems (posed as a maximization or minimization problem) with the fewest number of experiments possible.

The notebooks use sequential learning to identify the material with the highest bulk modulus and highest ionic conductivity. They all obtain their data from Citrination databases (https://citrination.com/), build models using random forest or neural networks, and compare different information acquisition strategies against random searches.

### Machine Learning for Materials Science: Part 1

Data science and machine learning are playing increasingly important role in science and engineering and materials science and engineering is not an exception. This online tool provides examples of the use of these tools in the field of materials science using Jupyter notebooks. The notebooks contain step by step explanations of the activities and live code, that can be modified by the users for hands-on learning. The initial set of tutorials focus on: i) data query, organization and visualization, ii) developing a simple model using linear regression to explore correlations between materials properties, and iii) neural network models trained to predict materials properties from basic element properties. Suggested activities are included in the Jupyter notebooks.

### TensorFlow Tutorials

This tool provides a set of tutorials to get started with machine learning using TensorFlow and Keras. The set of tutorials were taken from TensorFlow (https://www.tensorflow.org/tutorials/) with copyright by François Chollet and deployed with minimal modifications. Using nanoHUB resources users can run the tutorials, modify them and explore machine learning from any laptop or tablet, without downloading or installing any software.

### Notebook: Gaussian process regression in 1D

With this tool, you can perform Gaussian process regression in x-y data. The code makes use of the excellent GPy package.

### Gaussian processes 2D

The choice of a function to approximate any given data, is not unique; such function can be found through different methods, in which the parameters of the model are calculated. This tool illustrates the process of sampling from a Gaussian process, to obtain a random function from a process with a given covariance and a mean of zero. Although the results are distributed around zero, this does not imply a loss of generality, since the mean can be changed by adding a function.

The Gaussian process tool takes a set of hyper parameters for a particular covariance function, which is used to calculate a covariance matrix. This matrix is positive definite and Cholesky decomposition can be used to make the sampling process computationally efficient.

### DataExplorer Lab

DataExplorerLab is a Python tool that allows exploration of datasets using visualizations. It is based on the Floatview library.

### High Pressure DFT Data

This Rappture tool allows users to retrieve data from DFT simulations for equations of state and high pressure properties of a set of transition metals in various crystal structures.

### Demo of Loading and Visualizing Proteins from the RCSB Protein Data Bank

Shows how we can load data from external databases and visualize it inside a jupyter notebook.