Skip to content

Reproduction & enhancement of SIR Watermark (ICLR 2024) for Large Language Models

Notifications You must be signed in to change notification settings

airman-and/NLP_Watermark_Model_Project

Repository files navigation

SIR Watermark - Semantic Invariant Robust Watermark for LLMs

A reproduction and enhancement of the paper "A Semantic Invariant Robust (SIR) Watermark for Large Language Models" (ICLR 2024).

📚 Reference Repositories

This project references the following repositories:

⚠️ Important Note on Implementation Differences

While we referenced the above repositories, we found that both implementations differ significantly from the paper's original specifications in several aspects:

  • Optimizer: Different optimization functions used
  • Parameter Size: Different model parameter dimensions
  • Model Architecture: Different architectural choices
  • Output Dimension: Different output dimensions
  • Other Implementation Details: Various other discrepancies

Due to these differences, we directly designed and implemented the architecture as specified in the paper, ensuring alignment with the original research. Additionally, due to time constraints, we used a subset of the dataset mentioned in the paper for training and evaluation, while maintaining the core methodology and evaluation metrics.

📋 Features

  • SIR Baseline: Original TransformModel (Paper settings)
  • SIR Enhanced: Improved architecture with LayerNorm, GELU, Dropout
  • KGW Comparison: Compare with Kirchenbauer et al.'s Green/Red List watermark
  • Full Pipeline: Training → Evaluation → Inference in one script

🎥 Demo Presentation Video

Demo Presentation Video: Group6-demo-presentation-video.mp4

This video demonstrates our live SIR watermark system, including:

  • Real-time training demonstration (1 epoch)
  • Watermark generation and detection
  • Multiple scenario tests
  • Performance evaluation results

Please click the link above to watch our demo presentation.

Presentation Slides: Canva Presentation

Please click the link above to view our presentation slides.

⚙️ Environment Setup

Using Conda (Recommended)

If you are using conda, we recommend creating a dedicated environment named sir:

# Create conda environment with Python 3.8+
conda create -n sir python=3.8 -y

# Activate the environment
conda activate sir

# Install dependencies
pip install -r requirements.txt

# Install Jupyter (required for notebooks)
pip install jupyter notebook

Note: After creating the environment, activate it with conda activate sir before running any notebooks or scripts.

Troubleshooting: If conda environment creation fails due to network issues, you can use the base environment (see below).

Using pip (Alternative)

If you prefer using pip directly without conda, or if conda environment creation fails:

# Install dependencies
pip install -r requirements.txt

# Install Jupyter (required for notebooks)
pip install jupyter notebook

Note: If all packages are already installed in your base environment (as shown in the test output), you can skip the installation step and just install Jupyter.

📦 Required Files

Note: Due to file size limitations, large model and data files are not included in this repository. You have two options:

Option 1: Download Pre-trained Models and Embeddings (Recommended)

We provide pre-trained models and embeddings via Google Drive:

📥 Download Link: Google Drive - Models & Embeddings

The Google Drive folder contains:

  • models.tar.gz (12.7 MB): Pre-trained Baseline and Enhanced models
  • embeddings.tar.gz (103.5 MB): Training embeddings (WikiText-103)

Step-by-Step Extraction Instructions:

  1. Download the files from Google Drive:

    • Click the link above
    • Download both models.tar.gz and embeddings.tar.gz to your local machine
  2. Navigate to the project root directory:

    cd /path/to/Group6-SIR

    (Replace /path/to/ with your actual project path)

  3. Extract the model files:

    # Extract models to the project root (files already have model/ path in archive)
    tar -xzf models.tar.gz -C .

    This will create:

    • model/enhanced_transform_model.pth
    • model/transform_model_cbert_1000.pth

    Important: Extract to project root (.) not to model/ directory, because the archive already contains the model/ path.

  4. Extract the embeddings:

    # Extract embeddings to the project root (files already have data/embeddings/ path in archive)
    tar -xzf embeddings.tar.gz -C .

    This will create:

    • data/embeddings/train_embeddings_wikitext.txt

    Important: Extract to project root (.) not to data/embeddings/ directory, because the archive already contains the data/embeddings/ path.

  5. Verify the files:

    # Check model files
    ls -lh model/*.pth
    
    # Check embedding file
    ls -lh data/embeddings/*.txt

Expected file sizes after extraction:

  • model/enhanced_transform_model.pth: ~8.0 MB
  • model/transform_model_cbert_1000.pth: ~5.8 MB
  • data/embeddings/train_embeddings_wikitext.txt: ~250 MB

Option 2: Train Models Yourself

If you prefer to train from scratch:

  1. Train the models: Run Group6-01-full.ipynb to train both Baseline and Enhanced models (saves to model/ directory)
  2. Generate embeddings: The training notebook will use embeddings from data/embeddings/train_embeddings_wikitext.txt. If this file is missing, you'll need to generate it first (see training notebook for details)

Note: The evaluation dataset (data/dataset/c4_train_sample.jsonl) is included in this repository as it's relatively small.

🚀 Quick Start

Using Notebooks (Recommended for Submission)

Start Jupyter Notebook:

# If running as root user, use --allow-root flag
jupyter notebook --allow-root

# Or use JupyterLab
jupyter lab --allow-root

Then:

  1. Training & Evaluation: Open and run Group6-01-full.ipynb

    • All code sections are marked with # OWN CODE headers
    • Detailed English comments explain each component
    • Training outputs are preserved for verification
  2. Inference & Demo: Open and run Group6-02-demo.ipynb

    • Demonstrates watermark generation and detection
    • Includes multiple test scenarios

Using Python Script

Important: If using conda, make sure to activate the environment first:

conda activate sir

Then proceed with the following:

# 1. Install dependencies (if not already installed)
pip install -r requirements.txt

# 2. Run inference (generate watermarked text and detect)
python sir_watermark.py --mode inference --prompt "Once upon a time"

# 3. Compare SIR with KGW
python sir_watermark.py --mode compare --sample_size 50

📁 Project Structure

project/
├── Group6-01-full.ipynb       # 🔥 Training & Evaluation notebook (with OWN CODE markers)
├── Group6-02-demo.ipynb       # 🔥 Inference & Demo notebook (with OWN CODE markers)
├── sir_watermark.py          # Main integrated script (Training/Evaluation/Inference/Compare)
├── requirements.txt           # Python dependencies
├── README.md                  # This file
├── data/
│   ├── embeddings/
│   │   └── train_embeddings_wikitext.txt  # Pre-computed embeddings (generate via training notebook)
│   └── dataset/
│       └── c4_train_sample.jsonl          # Evaluation dataset (C4) - included
├── model/
│   ├── transform_model_cbert_1000.pth     # Pre-trained baseline model (train via Group6-01-full.ipynb)
│   └── enhanced_transform_model.pth       # Pre-trained enhanced model (train via Group6-01-full.ipynb)
└── legacy/                    # Legacy scripts (deprecated, use sir_watermark.py instead)
    ├── integrated_training.py
    ├── integrated_evaluation.py
    └── src/                    # Legacy source modules (all functionality in sir_watermark.py)
        ├── models.py
        ├── utils.py
        ├── evaluate.py
        └── train_enhanced.py

📓 Notebooks

Group6-01-full.ipynb

Training & Evaluation Notebook

This notebook contains the complete training and evaluation pipeline:

  • OWN CODE: Utility function definitions (cosine similarity, loss function, penalties)
  • OWN CODE: Model architecture definitions (Baseline and Enhanced Transform models)
  • OWN CODE: Watermark system implementation
  • OWN CODE: Model training (2000 epochs with detailed loss logging)
  • OWN CODE: Model evaluation with comprehensive metrics (F1, TPR, FPR, Accuracy)

Key Features:

  • All code sections marked with # OWN CODE headers
  • Detailed English comments explaining "What" and "Why" for each component
  • Training output preserved for verification
  • Evaluation metrics comparable to paper results

Group6-02-demo.ipynb

Inference & Demo Notebook

This notebook demonstrates practical usage scenarios:

  • OWN CODE: Watermark generation demo (injecting watermarks into generated text)
  • OWN CODE: Watermark detection demo (detecting watermarks in text)
  • OWN CODE: Various scenario tests (multiple prompts and situations)

Key Features:

  • Interactive demos showing watermark injection and detection
  • Comparison between watermarked and unwatermarked text
  • Multiple test scenarios for robustness verification
  • Clear English documentation throughout

🔧 Usage

Notebook Usage

Group6-01-full.ipynb (Training & Evaluation):

  1. First time: Run all cells sequentially to train models
    • Baseline model training (2000 epochs) → saves to model/transform_model_baseline.pth
    • Enhanced model training (2000 epochs) → saves to model/enhanced_transform_model.pth
    • Training takes several hours depending on hardware
  2. After training: Models are saved and can be reused
  3. Evaluation will output comprehensive metrics (F1, TPR, FPR, Accuracy)
  4. All outputs are preserved for submission

Group6-02-demo.ipynb (Inference & Demo):

  1. Requires trained models: Must run Group6-01-full.ipynb first to generate model files
  2. Demonstrates watermark generation and detection
  3. Includes multiple test scenarios

Script Usage

Training

# Train enhanced model (2000 epochs)
python sir_watermark.py --mode train --model enhanced --epochs 2000

# Train baseline model
python sir_watermark.py --mode train --model baseline --epochs 2000

Evaluation

# Evaluate a trained model
python sir_watermark.py --mode evaluate --model_path model/enhanced_transform_model.pth --sample_size 50

Inference

# Generate watermarked text and detect
python sir_watermark.py --mode inference --prompt "Your prompt here" --max_tokens 100

Compare with KGW

# Compare SIR vs KGW (requires MarkLLM)
git clone https://github.com/THU-BPM/MarkLLM.git ../MarkLLM
python sir_watermark.py --mode compare --sample_size 50

Full Pipeline

# Train → Evaluate → Inference
python sir_watermark.py --mode all --epochs 2000

📊 Results

Method F1 Score TPR FPR Accuracy
KGW 1.0000 1.0000 0.0000 1.0000
SIR Enhanced 0.9495 0.9400 0.0400 0.9500
SIR Baseline 0.9474 0.9000 0.0000 0.9500

Note: KGW performs better in attack-free environments but is vulnerable to text modifications. SIR is more robust against semantic attacks.

⚙️ Options

Option Description Default
--mode train / evaluate / inference / compare / all all
--model baseline / enhanced enhanced
--epochs Training epochs 2000
--llm_model LLM for generation facebook/opt-1.3b
--sample_size Evaluation samples 50
--prompt Inference prompt "Once upon a time..."
--delta Watermark strength 1.0

📚 References

📝 License

This project is for educational and research purposes.

About

Reproduction & enhancement of SIR Watermark (ICLR 2024) for Large Language Models

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published