This repository contains codes related to the publication "Leveraging composition-based energy material descriptors for machine learning models" (https://www.sciencedirect.com/science/article/pii/S2352492823012709).
In particular:
- Folder
Classificationcontains code for training/validation/testing of all classifiers used in this work (ETCs, QEGs, Naive Bayesian), for assessing their performances (Classifiers_performances.ipynb), together with the file containing the predicted probabilities for materials in the testing set to be in class 1 (classifiers_comparison.xlsx). - Folder
Mixed features optimizationcontains all.mfiles for finding the optimized mixed features with multi-objective optimization as linear or product combination of the original features extracted by means of Matminer. Specifically, the fileMAIN.mhas to be run, deciding how many features to mix and how many mixed features to have in output (1 or 2); on the contrary,MainSingle.mhas to be run for finding mixed features with single-objective optimization. FolderPareto frontscontains already computed Pareto fronts for the examples shown in this work. - Folder
Regression & invariancecontains the fileETR&SHAP.ipynbfor training/validating/testing the ETR model, with the SHAP analysis to rank the input features, together with such rankingSHAP_for_ETR_metallic_mean.xlsxand with the code for the search of invariant groupsDNN&invariant_groups.ipynb. - File
Coefficients_mixed_variables.xlsxcontains the coefficients for mixing the first 30 or 52 original features extracted by means of Matminer in the ranking obtained with SHAP. - File
Database_construction.ipynbcontains the code for cleaning the original SuperCon database. - File
predictions_on_MPj_materials.xlsxcontains the probability predictions of the best two classifiers of this work (ETC-vanilla, ETC-SMOTE), the best QEG-based classifier (QEG 2D-mixed lin) and of the GEV classifier (for$T_{\rm{c}}\geq 35~\textup{K}$ ) over the$\sim$ 40,000 materials in MaterialsProject and not in SuperCon; furthermore, the class prediction 1/0 is also provided, considering the probability threshold which maximizes the$F_{1, \textup{max}}$ score.