ClimateDebunk

Authors: Hina Bandukwala, Rafe Chang

Project Overview

This project is part of the HuggingFace Frugal AI Challenge where we are tasked with creating an efficient model for classifying quotes relating to climate change as one of 8 predefined categories based on the CARDS taxonomy.

Data

The training data consists of ~6000 quotes from various sources. The quotes are labeled with one of the above mentioned 8 categories. More information about the data can be found here.

We augmented the training data using the nlpaug library to deal with the class imbalance in the original dataset. The augmented training data consists of ~12000 quotes.

The original training and validation datasets as well as the augmented data can be found here.

Model

We used a pretrained DistilBERT model for sequence classification and fine-tuned it using our augmented dataset. We incorporated a learning rate scheduler, applied regularization techniques, and performed hyperparameter tuning to improve the model's performance achieving an accuracy of 95% on the validation set. Given the high accuracy, we acknowledge that the model is overfitting on the training data.

Repository Structure

ClimateDebunk/
├── configs/               # Configuration files
│   ├── augmentation_config.yaml
│   ├── config.yaml
│   ├── hyperoptim_config.yaml
│   └── quantization_config.yaml
├── data/                  # Data files
│   ├── test/              # Test data
│   ├── train/             # Training data
│   └── valid/             # Validation data
├── models/                # Trained models
├── notebooks/             # Jupyter notebooks
│   ├── 01_augment_train.ipynb
│   ├── 02_hyperoptimization.ipynb
│   ├── 03_train_model.ipynb
│   ├── 04_inference.ipynb
│   └── 05_quantization.ipynb
├── src/                   # Source code
│   ├── augment_train.py   # Data augmentation script
│   ├── config.py          # Configuration utilities
│   ├── data_prep.py       # Data preparation script
│   ├── hyperoptim.py      # Hyperparameter optimization script
│   ├── model.py           # Model definition
│   ├── quantize.py        # Model quantization script
│   ├── train.py           # Training script
│   └── utils.py           # Utility functions
├── tests/                 # Test files
│   ├── test_augment_train.py
│   ├── test_config.py
│   ├── test_data_prep.py
│   ├── test_hyperoptim.py
│   ├── test_model.py
│   ├── test_quantize.py
│   ├── test_train.py
│   └── test_utitls.py      
├── environment.yml        # Conda environment file
└── README.md              # Project documentation

Set up

To set up the project, follow these steps:

Clone the repository:

git clone https://github.com/yourusername/ClimateDebunk.git
cd ClimateDebunk

Use the environment.yml file to set up a conda environment with all the required dependencies:

conda env create -f environment.yml
conda activate climate_debunk

To ensure that the functions are working as expected, run the tests. From the root directory, run the following command:

pytest tests/

Usage

Inference

The project is set up such that you can use the model for inference on your own data. To do so, follow these steps:

Populate configuration file

Fill in the required information in the config.yaml file:

- testpath: "path/to/test/data"  # path to your test data
- test_label_col: "label"  # column name for label data
- trained_model_path: "path/to/trained/model"  # path to trained model

Run the inference notebook

This notebook can be found at notebooks/04_inference.ipynb. It will load the trained model and perform inference on the test data.

Notes

We worked on quantizing the model to reduce its size and make it more efficient for deployment. However, we were unable to complete this task due to time constraints. The notebook for quantization can be found at notebooks/05_quantization.ipynb.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ClimateDebunk

Project Overview

Data

Model

Repository Structure

Set up

Usage

Inference

Notes

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
configs		configs
notebooks		notebooks
src		src
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml

hbandukw/ClimateDebunk

Folders and files

Latest commit

History

Repository files navigation

ClimateDebunk

Project Overview

Data

Model

Repository Structure

Set up

Usage

Inference

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages