Supplementary Data for &quot;An unsupervised machine learning based approach to identify efficient spin-orbit torque materials&quot;

Shehrin Sayed; Hannah Kleidermacher; Giulianna Hashemi-Asasi; Cheng-Hsiang Hsu; Sayeef Salahuddin

Home › Downloads › Supplementary Data for "An unsupervised machine learning based approach to identify efficient spin-orbit torque materials" › About

Supplementary Data for "An unsupervised machine learning based approach to identify efficient spin-orbit torque materials"

By Shehrin Sayed¹; Hannah Kleidermacher²; Giulianna Hashemi-Asasi; Cheng-Hsiang Hsu; Sayeef Salahuddin²

1. Electrical Engineering and Computer Sciences, University of California-Berkeley, Berkeley, CA 2. University of California, Berkeley

Download (ZIP)

Published on

18 Feb 2024

Abstract

Introduction:

There has been a growing interest in materials with large spin-orbit torques (SOT) for many novel applications, and in our article [1], which is currently under review, we have shown that a machine-learning-based approach using a word embedding model can predict new materials that are likely to exhibit strong SOT. This page serves as a repository for the supplementary data for the article, where we provide sample copies of the trained models used in the article.

Comment on the training dataset:

The training text corpus contains about one million unlabeled scientific abstracts collected from various materials science, physics, and engineering journals published between 1970 and 2020 by the American Physical Society (harvest.aps.org) and the Institute of Electrical and Electronics Engineers (developer.ieee.org). The journals were selected to encompass sufficient knowledge of materials, physics, and device engineering. The data were collected in chronological order and merged together to create the dataset for training. Keyword-based search, preprocessing of any form, or manual organization of the data were not performed to avoid any unintentional bias in the dataset.

Comment on model training:

We have used the Word2Vec word embedding model implemented in gensim (https://radimrehurek.com/gensim/). We have used the skip-gram architecture of the neural network where a target word is represented as a one-hot encoded vector at the input layer. The neural network also consists of a single hidden layer and an output layer that performs negative sampling with n = 15. We have set the size of the word embedding to 200 according to the nature of the text corpus. The window size is 7; the minimum count threshold is 5 (which excludes any words that do not appear more than 5 times in the text corpus); the phrase depth is 2, a minimum number of phrase occurrences to be considered as 10, phrase importance threshold of 15, number of workers is 16, minibatch size of 10000, the learning rate of 0.01, subsampling rate of 0.0001, and number of epochs is 30.

Comment on the scope of the model:

The word embedding model learned key concepts of science and engineering from the text dataset and inherited a broad range of knowledge on various scientific disciplines. However, in our manuscript [1], we use the trained model only in the context of magnetism and spintronics research.

User Guide:

Please install Anaconda 3.9 and also install the gensim 3.8 package using the following command:

pip install gensim==3.8

or

conda install gensim==3.8

Then open a Python script and import numpy and gensim:

import numpy as np

from gensim.models import Word2Vec

Load the model with its filepath:

w2v_model = Word2Vec.load("C:/Dummy_Folder/model_file_name")

How to find similar words:

In order to find similar words for a keyword, use the following syntax:

w2v_model.wv.most_similar(“keyword", topn=10)

where the topn parameter sets how many similar words to show.

Analogy questions:

In order to ask an analogy question, e.g., if Fe is ferromagnetic, what are semiconductors, use the following expression:

w2v_model.wv.most_similar(positive=["Fe", "semiconductor"], negative=["ferromagnetic"], topn=10)

Doesn't match function:

The "does not match" function can be used to identify which element in an array is highly different from others, e.g., the result of the following expression is "Ru".

w2v_model.wv.doesnt_match(["Fe", "Co", "Ru"])

Calculate the cosine distance between two vectors:

In order to calculate the cosine distance between two vectors, you can use the following built-in function:

w2v_model.wv.similarity("word1", "word2")

However, it can be done manually as follows. We can first get the word embeddings for the corresponding words as

wrd_embd1 = w2v_model.wv["word1"]
wrd_embd2 = w2v_model.wv["word2"]

Then, the cosine distance can be calculated using the following expression:

cos_distance = np.dot(wrd_embd1,wrd_embd2)/(np.linalg.norm(wrd_embd1)*np.linalg.norm(wrd_embd2))

Note for the users:

A part of the data was obtained under the American Physical Society's Research Harvest License and under the non-commercial scope of use agreement. The scope of using these models is restricted to educational and academic purposes only.

For any derivations based on this model, the users agree to acknowledge our research group and cite our original article [1] and this nanoHUB page on appropriate occasions.

The model is provided as-is, and technical support of any form is not available. However, users are welcome to email the researcher (shehrinsayeed AT gmail.com) if they have any questions.

We hope you enjoy playing with the model.

THIS PAGE IS UNDER DEVELOPMENT.

References

[1] S. Sayed, H. Kleidermacher, G. Hashemi-Asasi, C.-H. Hsu, and S. Salahuddin, S. (2022). Unsupervised machine learning-driven search for efficient spin-orbit torque materials. DOI: 10.21203/rs.3.rs-1718292/v1.

Cite this work

Researchers should cite this work as follows:

Shehrin Sayed, Hannah Kleidermacher, Giulianna Hashemi-Asasi, Cheng-Hsiang Hsu, Sayeef Salahuddin (2024), "Supplementary Data for "An unsupervised machine learning based approach to identify efficient spin-orbit torque materials"," https://nanohub.org/resources/38611.

BibTex | EndNote