This module introduces modern tools for data acquisition, including performing large queries using application programming interfaces (APIs), with hands-on online workflows. Cyber-infrastructure platforms for data offer unparalleled access to data, this module will introduce tools to manage, filter, organize, and visualize datasets. The community contribution to these platforms is critical for accelerated materials development and processing. Rapidly accessing these databases, handling the sets of data collected, and leveraging them for supervised and unsupervised learning techniques will be the focus of this module.
This end-to-end module is designed to be self-contained and easy to incorporate in existing courses or used for self-study. The module consists of three components:
- Pre-recoded lecture: introduction to data repositories and APIs
YouTube | Video Download (MP4) | Slides (PDF) | Slides (PPTX)
- Hands-on tutorial using nanoHUB: Querying the Materials Project
Download (PDF) | Download (PPTX)
- Homework Assignment
Download (PDF) | Download (DOCX)
This module is part of a series on data science and machine learning for engineering and physical sciences. Users will be able to run interactive code online using nanoHUB, no need to download or install any software.
Learning objectives. After completing this module, you will:
- Learn about and querying and data repositories
- Manage data through efficient Pandas dataframes
- Perform tailored queries specific to user application
- Basic Python programming (see https://nanohub.org/resources/33266)
Cite this work
Researchers should cite this work as follows: