Hands-On Data Science and Machine Learning in Undergraduate Education

By Alejandro Strachan1; Saaketh Desai1; Juan Carlos Verduzco Gastelum1; Michael N Sakano1; Zachary D McClure1; Joseph M. Cychosz2; Jared Gray West2

1. Materials Engineering, Purdue University, West Lafayette, IN 2. Network for Computational Nanotechnology, Purdue University, West Lafayette, IN



Published on


This series of modules introduce key concepts in data science in the context of application in materials science and engineering. The end to end modules include:

  • A recorded lecture that introduces each topic and provides background material,
  • A hands-on tutorial with step-by-step instructions to perform interactive online activities and run interactive code,
  • A homework assignment designed to help users explore the concepts using online models and simulations and adopt the code to problems of their interest.

The modules are self-contained and modular, they are designed for easy incorporation into existing courses or for those interested in self-study.

All interactive computing is performed using cloud computing in nanoHUB, there is no need to download or install any software. All resources are open and free.

Knowledge and Skills

  1. Data handling
  2. Predictive modeling
    • Data visualization – See Module 2 and Module 3
    • Digital representation and descriptors for materials – See Module 3
    • Simple regression models – See Module 4
    • Machine learning models for regression and classification - See Module 5
    • Random forests and decision trees – See Module 7
  3. Decision making
    • Uncertainty quantification – See Module 6
    • Active learning for design of experiments – See Module 7


The interactive computing is performed using python through Jupyter notebooks. Basic programing skills are required. An introductory tutorial on Jupyter, python and plotting is available at: https://nanohub.org/resources/33266

Sponsored by

Cite this work

Researchers should cite this work as follows:

  • Alejandro Strachan, Saaketh Desai, Juan Carlos Verduzco Gastelum, Michael N Sakano, Zachary D McClure, Joseph M. Cychosz, Jared Gray West (2020), "Hands-On Data Science and Machine Learning in Undergraduate Education," https://nanohub.org/resources/34285.

    BibTex | EndNote


Hands-on Learning Modules on Data Science and Machine Learning in Engineering

Lecture Number/Topic Online Lecture Video Lecture Notes Supplemental Material Suggested Exercises
Module 1: Making Data Accessible, Discoverable and Useful View Notes (pdf) YouTube
Homework Assignment
Module 2: Querying Materials Data Repositories View Notes (pdf) Hands-on Tutorial
Homework Assignment
Module 3: Materials Descriptors for Data Science View Notes (pdf) Hands-on Tutorial
Homework Assignment
Module 4: Linear Regression Models View Notes (pdf) Hands-on Tutorial - Young's Modulus
Hands-on Tutorial - Correlations
Homework Assignment - Correlations
This module introduces linear regression in the context of materials science and engineering. We will apply liner regression to predict materials properties and to explore correlations between...

Module 5: Neural Networks for Regression and Classification View Notes (pdf) Hands-on Tutorial - Regression
Hands-on Tutorial - Classification
Homework Assignment - Regression
Homework Assignment - Classification
Module 7: Active Learning for Design of Experiments View Notes (pdf) Hands-on Tutorial
Homework Assignment