Module 2: Querying Materials Data Repositories

By Zachary D McClure1; Alejandro Strachan1

1. Materials Engineering, Purdue University, West Lafayette, IN

Published on


Run the Tool: Querying Data Repositories This module introduces modern tools for data acquisition, including performing large queries using application programming interfaces (APIs), with hands-on online workflows. Cyber-infrastructure platforms for data offer unparalleled access to data, this module will introduce tools to manage, filter, organize, and visualize datasets. The community contribution to these platforms is critical for accelerated materials development and processing. Rapidly accessing these databases, handling the sets of data collected, and leveraging them for supervised and unsupervised learning techniques will be the focus of this module.

This end-to-end module is designed to be self-contained and easy to incorporate in existing courses or used for self-study. The module consists of three components:

This module is part of a series on data science and machine learning for engineering and physical sciences. Users will be able to run interactive code online using nanoHUB, no need to download or install any software.

Learning objectives. After completing this module, you will:

  • Learn about and querying and data repositories
  • Manage data through efficient Pandas dataframes
  • Perform tailored queries specific to user application


Sponsored by

Cite this work

Researchers should cite this work as follows:

  • Zachary D McClure, Alejandro Strachan (2020), "Module 2: Querying Materials Data Repositories,"

    BibTex | EndNote