This repository accompanies the research paper:
“Enhanced Thermophysical Property Prediction with Uncertainty Quantification using Group Contribution-Gaussian Process Regression”
This repository contains all data, scripts, and results related to the study. Below is a breakdown of the key folders and files:
Data_for_Model_Buildingfolder containing all curated and preprocessed datasets (*_fcl.csv) for model building for all properties.Hvap_data_test_fluorinated_molecules.csv: Test set used to evaluate GCGP ΔHvap model performance on highly fluorinated molecules.
- Located in the
Final_Resultsfolder. - Includes:
- Parity plots
- Numerical model predictions
- Model performance metrics
- Model training outputs
- Results from different random seeds for train/test splits
- Results (numerical data) from timing tests
lml_values.csv: A summary of collected LML values from kernel architecture experiments.
- Found in
kernel_sweep_code_and_resultsfolder. - Contains results of testing multiple kernel designs and model architectures as detailed in the paper.
Data_Visualization_PretrainingandData_Quality_and_Outlier_Checksfolders include:- Data analysis plots
- Visualization figures
- Outlier detection and data quality assessment
Tm_whitenoise_testsfolder contains results analyzing the impact of various white noise kernel settings on normal melting temperature (Tm) predictions.
LML_plotsfolder contains LML plots for various model and kernel combinations across all properties.-
lml_values.csvinFinal_Resultsfolder contains data used in making the plots inLML_plotsas well as additional data.
Code.pyfolder contains all Python code (*.py) used to analyze data, train models, evaluate model performance, and generate figures.- Scripts are descriptively named for easy navigation and use.
If you use this code or data, please cite the corresponding paper.
Feel free to open an issue or pull request if you have questions, suggestions, or contributions!