Publications

3-D Inorganic Crystal Structure Generation and Property Prediction via Representation Learning

Published in Journal of Chemical Information and Modeling, 2020

Generative models have been successfully used to synthesize completely novel images, text, music and speech. As such, they present an exciting opportunity for the design of new materials for functional applications. So far, generative deep-learning methods applied to molecular and drug discovery have yet to produce stable and novel 3-D crystal structures across multiple material classes. To that end, we herein present an autoencoder-based generative deep-representation learning pipeline for geometrically optimized 3-D crystal structures that simultaneously predicts the values of eight target properties. The system is highly general, as demonstrated through creation of novel materials from three separate material classes: binary alloys, ternary perovskites and Heusler compounds. Comparison of these generated structures to those optimized via electronic-structure calculations shows that our generated materials are valid and geometrically optimized.

Recommended citation: Court C.J, Yildirim B., Jain A. & Cole J.M "3-D Inorganic Crystal Structure Generation and Property Prediction via Representation Learning" Journal of Chemical Information and Modeling (accepted for publication) (2020) https://pubs.acs.org/doi/10.1021/acs.jcim.0c00464

Magnetic and superconducting phase diagrams and transition temperatures predicted using text mining and machine learning

Published in NPJ Computational Materials, 2020

Predicting the properties of materials prior to their synthesis is of great importance in materials science. Magnetic and superconducting materials exhibit a number of unique properties that make them useful in a wide variety of applications, including solid oxide fuel cells, solid-state refrigerants, photon detectors and metrology devices. In all these applications, phase transitions play an important role in determining the feasibility of the materials in question. Here, we present a pipeline for fully integrating data extracted from the scientific literature into machine-learning tools for property prediction and materials discovery. Using advanced natural language processing (NLP) and machine-learning techniques, we successfully reconstruct the phase diagrams of well-known magnetic and superconducting compounds, and demonstrate that it is possible to predict the phase-transition temperatures of compounds not present in the database. We provide the tool as an online open-source platform, forming the basis for further research into magnetic and superconducting materials discovery for potential device applications.

Recommended citation: Court C.J & Cole J.M "Magnetic and superconducting phase diagrams and transition temperatures predicted using text mining and machine learning" NPJ Computational Materials. 6, 18 (2020) https://www.nature.com/articles/s41524-020-0287-8

Auto-generated Materials Database of Curie and Néel Temperatures via Semi-supervised Relationship Extraction

Published in Nature Scientific Data, 2018

Large auto-generated databases of magnetic materials properties have the potential for great utility in materials science research. This article presents an auto-generated database of 39,822 records containing chemical compounds and their associated Curie and Néel magnetic phase transition temperatures. The database was produced using natural language processing and semi-supervised quaternary relationship extraction, applied to a corpus of 68,078 chemistry and physics articles. Evaluation of the database shows an estimated overall precision of 73%. Therein, records processed with the text-mining toolkit, ChemDataExtractor, were assisted by a modified Snowball algorithm, whose original binary relationship extraction capabilities were extended to quaternary relationship extraction. Consequently, its machine learning component can now train with ≤ 500 seeds, rather than the 4,000 originally used. Data processed with the modified Snowball algorithm affords 82% precision. Database records are available in MongoDB, CSV and JSON formats which can easily be read using Python, R, Java and MatLab. This makes the database easy to query for tackling big-data materials science initiatives and provides a basis for magnetic materials discovery.

Recommended citation: Court C.J & Cole J.M "Auto-generated Materials Database of Curie and Néel Temperatures via Semi-supervised Relationship Extraction" Scientific Data. 5, 180111 (2018) https://www.nature.com/articles/sdata2018111