IW-DPO

This repository is the official implementation of Importance Weighting for Aligning Language Models under Deployment Distribution Shift.

Note: The code was tested on a computer with eight NVIDIA A100-SXM4-40GB GPUs.

Requirements

See install.sh

Data preparation

We recommend preparing the dataset as a JSON file, where each example consists of the following data fields:

{
    "prompt": [
        {
            "role": "user",
            "content": "..."
        }
    ],
    "output_A": [
        {
            "role": "assistant",
            "content": "..."
        }
    ],
    "output_B": [
        {
            "role": "assistant",
            "content": "..."
        }
    ],
    "label": 1,
    "reward_A": 1,
    "reward_B": 0,
    "reward_difference": 1,
    "type": "pairwise_feedback",
    "split": "train", /* Use "train" for training example, "test" for test examples and "validation" for validation examples. */
    "origin": "test"
}

Validation examples must be labeled as "train" if you intend to include the validation data during SFT/DPO training.

As an example, we provide a script to generate training, validation, and test datasets from SafeRLHF, located at process_datasets/saferlhf.py. You can run the script using the following command:

python -m process_datasets.saferlhf

Training

Before proceeding, please

provide your Hugging Face token or be logged in to Hugging Face,
ensure that you have access to models available on Hugging Face, such as Llama-3.1-8B-Instruct, since some models require gaining access,
edit fsdp.yaml according to your needs and hardware configuration,
prepare your wandb account.

See the available models at config/model.

SFT

accelerate launch --config_file accelerate_config/fsdp.yaml --main_process_port 29500 launch.py \
    n_epochs=1 loss=sft model=pythia datasets=[examples/saferlhf_combined-train-val_feedback.json] exp_name=safe_lm_SFT seed=1 ++cache_dir=.cache/data/models ++model.name_or_path=EleutherAI/pythia-2.8b ++lr=5e-6 ++loss.beta=0.1 model.batch_size=32 model.eval_batch_size=32 model.max_length=512 model.max_prompt_length=256

Note that SFT is optional.

IW-DPO

accelerate launch --config_file accelerate_config/fsdp.yaml --main_process_port 29500 launch.py \
    # General config
    n_epochs=1 loss=dpo model=pythia exp_name=safe_lm_IW-DPO seed=1 ++cache_dir=.cache/data/models ++model.name_or_path=EleutherAI/pythia-2.8b ++lr=5e-6 ++loss.beta=0.1 model.batch_size=32 model.eval_batch_size=32 model.max_length=512 model.max_prompt_length=256 ++model.load_from=.cache/data/models/safe_lm_SFT/FINAL \
    # IW-DPO-specific config
    datasets=[examples/saferlhf_separated-train-val_feedback.json] model.val_batch_size=32 iw.enabled=true iw.t=reward iw.warmup_examples=1024 iw.kernel_width=null iw.lambda_reg=0.1 iw.normalize_w=true iw.method=kmm

Evaluation

Sample

python -m train.sample .cache/data/models/safe_lm_IW-DPO/FINAL --gpu_count 8 --output_file outputs/generations/safe_lm_IW-DPO.json --datasets examples/saferlhf_only-test_feedback.json --mode safe_lm --max_tokens 512 --max_prompt_length 256

LLM Judge

Please set the OPENAI_API_KEY environment variable to your OpenAI API key before running.

python -m train.evaluate --input_file outputs/generations/safe_lm_IW-DPO.json  --task safe_lm --evaluator gpt-4o-mini

Acknowledgement

Our code is based on ContextualAI's HALOs.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
accelerate_config		accelerate_config
config		config
figures		figures
process_datasets		process_datasets
train		train
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
install.sh		install.sh
launch.py		launch.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

IW-DPO

Requirements

Data preparation

Training

SFT

IW-DPO

Evaluation

Sample

LLM Judge

Acknowledgement

About

Uh oh!

Releases

Packages

Languages

License

ishida-lab/IW-DPO

Folders and files

Latest commit

History

Repository files navigation

IW-DPO

Requirements

Data preparation

Training

SFT

IW-DPO

Evaluation

Sample

LLM Judge

Acknowledgement

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages