SIR Watermark - Semantic Invariant Robust Watermark for LLMs

A reproduction and enhancement of the paper "A Semantic Invariant Robust (SIR) Watermark for Large Language Models" (ICLR 2024).

📚 Reference Repositories

This project references the following repositories:

Robust_Watermark: Original implementation by the paper authors
MarkLLM: Open-source toolkit for LLM watermarking

⚠️ Important Note on Implementation Differences

While we referenced the above repositories, we found that both implementations differ significantly from the paper's original specifications in several aspects:

Optimizer: Different optimization functions used
Parameter Size: Different model parameter dimensions
Model Architecture: Different architectural choices
Output Dimension: Different output dimensions
Other Implementation Details: Various other discrepancies

Due to these differences, we directly designed and implemented the architecture as specified in the paper, ensuring alignment with the original research. Additionally, due to time constraints, we used a subset of the dataset mentioned in the paper for training and evaluation, while maintaining the core methodology and evaluation metrics.

📋 Features

SIR Baseline: Original TransformModel (Paper settings)
SIR Enhanced: Improved architecture with LayerNorm, GELU, Dropout
KGW Comparison: Compare with Kirchenbauer et al.'s Green/Red List watermark
Full Pipeline: Training → Evaluation → Inference in one script

🎥 Demo Presentation Video

Demo Presentation Video: Group6-demo-presentation-video.mp4

This video demonstrates our live SIR watermark system, including:

Real-time training demonstration (1 epoch)
Watermark generation and detection
Multiple scenario tests
Performance evaluation results

Please click the link above to watch our demo presentation.

Presentation Slides: Canva Presentation

Please click the link above to view our presentation slides.

⚙️ Environment Setup

Using Conda (Recommended)

If you are using conda, we recommend creating a dedicated environment named sir:

# Create conda environment with Python 3.8+
conda create -n sir python=3.8 -y

# Activate the environment
conda activate sir

# Install dependencies
pip install -r requirements.txt

# Install Jupyter (required for notebooks)
pip install jupyter notebook

Note: After creating the environment, activate it with conda activate sir before running any notebooks or scripts.

Troubleshooting: If conda environment creation fails due to network issues, you can use the base environment (see below).

Using pip (Alternative)

If you prefer using pip directly without conda, or if conda environment creation fails:

# Install dependencies
pip install -r requirements.txt

# Install Jupyter (required for notebooks)
pip install jupyter notebook

Note: If all packages are already installed in your base environment (as shown in the test output), you can skip the installation step and just install Jupyter.

📦 Required Files

Note: Due to file size limitations, large model and data files are not included in this repository. You have two options:

Option 1: Download Pre-trained Models and Embeddings (Recommended)

We provide pre-trained models and embeddings via Google Drive:

📥 Download Link: Google Drive - Models & Embeddings

The Google Drive folder contains:

models.tar.gz (12.7 MB): Pre-trained Baseline and Enhanced models
embeddings.tar.gz (103.5 MB): Training embeddings (WikiText-103)

Step-by-Step Extraction Instructions:

Download the files from Google Drive:
- Click the link above
- Download both models.tar.gz and embeddings.tar.gz to your local machine
Navigate to the project root directory:
```
cd /path/to/Group6-SIR
```
(Replace /path/to/ with your actual project path)
Extract the model files:
```
# Extract models to the project root (files already have model/ path in archive)
tar -xzf models.tar.gz -C .
```
This will create:
- model/enhanced_transform_model.pth
- model/transform_model_cbert_1000.pth
Important: Extract to project root (.) not to model/ directory, because the archive already contains the model/ path.
Extract the embeddings:
```
# Extract embeddings to the project root (files already have data/embeddings/ path in archive)
tar -xzf embeddings.tar.gz -C .
```
This will create:
- data/embeddings/train_embeddings_wikitext.txt
Important: Extract to project root (.) not to data/embeddings/ directory, because the archive already contains the data/embeddings/ path.

Verify the files:

# Check model files
ls -lh model/*.pth

# Check embedding file
ls -lh data/embeddings/*.txt

Expected file sizes after extraction:

model/enhanced_transform_model.pth: ~8.0 MB
model/transform_model_cbert_1000.pth: ~5.8 MB
data/embeddings/train_embeddings_wikitext.txt: ~250 MB

Option 2: Train Models Yourself

If you prefer to train from scratch:

Train the models: Run Group6-01-full.ipynb to train both Baseline and Enhanced models (saves to model/ directory)
Generate embeddings: The training notebook will use embeddings from data/embeddings/train_embeddings_wikitext.txt. If this file is missing, you'll need to generate it first (see training notebook for details)

Note: The evaluation dataset (data/dataset/c4_train_sample.jsonl) is included in this repository as it's relatively small.

🚀 Quick Start

Using Notebooks (Recommended for Submission)

Start Jupyter Notebook:

# If running as root user, use --allow-root flag
jupyter notebook --allow-root

# Or use JupyterLab
jupyter lab --allow-root

Then:

Training & Evaluation: Open and run Group6-01-full.ipynb
- All code sections are marked with # OWN CODE headers
- Detailed English comments explain each component
- Training outputs are preserved for verification
Inference & Demo: Open and run Group6-02-demo.ipynb
- Demonstrates watermark generation and detection
- Includes multiple test scenarios

Using Python Script

Important: If using conda, make sure to activate the environment first:

conda activate sir

Then proceed with the following:

# 1. Install dependencies (if not already installed)
pip install -r requirements.txt

# 2. Run inference (generate watermarked text and detect)
python sir_watermark.py --mode inference --prompt "Once upon a time"

# 3. Compare SIR with KGW
python sir_watermark.py --mode compare --sample_size 50

📁 Project Structure

project/
├── Group6-01-full.ipynb       # 🔥 Training & Evaluation notebook (with OWN CODE markers)
├── Group6-02-demo.ipynb       # 🔥 Inference & Demo notebook (with OWN CODE markers)
├── sir_watermark.py          # Main integrated script (Training/Evaluation/Inference/Compare)
├── requirements.txt           # Python dependencies
├── README.md                  # This file
├── data/
│   ├── embeddings/
│   │   └── train_embeddings_wikitext.txt  # Pre-computed embeddings (generate via training notebook)
│   └── dataset/
│       └── c4_train_sample.jsonl          # Evaluation dataset (C4) - included
├── model/
│   ├── transform_model_cbert_1000.pth     # Pre-trained baseline model (train via Group6-01-full.ipynb)
│   └── enhanced_transform_model.pth       # Pre-trained enhanced model (train via Group6-01-full.ipynb)
└── legacy/                    # Legacy scripts (deprecated, use sir_watermark.py instead)
    ├── integrated_training.py
    ├── integrated_evaluation.py
    └── src/                    # Legacy source modules (all functionality in sir_watermark.py)
        ├── models.py
        ├── utils.py
        ├── evaluate.py
        └── train_enhanced.py

📓 Notebooks

Group6-01-full.ipynb

Training & Evaluation Notebook

This notebook contains the complete training and evaluation pipeline:

OWN CODE: Utility function definitions (cosine similarity, loss function, penalties)
OWN CODE: Model architecture definitions (Baseline and Enhanced Transform models)
OWN CODE: Watermark system implementation
OWN CODE: Model training (2000 epochs with detailed loss logging)
OWN CODE: Model evaluation with comprehensive metrics (F1, TPR, FPR, Accuracy)

Key Features:

All code sections marked with # OWN CODE headers
Detailed English comments explaining "What" and "Why" for each component
Training output preserved for verification
Evaluation metrics comparable to paper results

Group6-02-demo.ipynb

Inference & Demo Notebook

This notebook demonstrates practical usage scenarios:

OWN CODE: Watermark generation demo (injecting watermarks into generated text)
OWN CODE: Watermark detection demo (detecting watermarks in text)
OWN CODE: Various scenario tests (multiple prompts and situations)

Key Features:

Interactive demos showing watermark injection and detection
Comparison between watermarked and unwatermarked text
Multiple test scenarios for robustness verification
Clear English documentation throughout

🔧 Usage

Notebook Usage

Group6-01-full.ipynb (Training & Evaluation):

First time: Run all cells sequentially to train models
- Baseline model training (2000 epochs) → saves to model/transform_model_baseline.pth
- Enhanced model training (2000 epochs) → saves to model/enhanced_transform_model.pth
- Training takes several hours depending on hardware
After training: Models are saved and can be reused
Evaluation will output comprehensive metrics (F1, TPR, FPR, Accuracy)
All outputs are preserved for submission

Group6-02-demo.ipynb (Inference & Demo):

Requires trained models: Must run Group6-01-full.ipynb first to generate model files
Demonstrates watermark generation and detection
Includes multiple test scenarios

Script Usage

Training

# Train enhanced model (2000 epochs)
python sir_watermark.py --mode train --model enhanced --epochs 2000

# Train baseline model
python sir_watermark.py --mode train --model baseline --epochs 2000

Evaluation

# Evaluate a trained model
python sir_watermark.py --mode evaluate --model_path model/enhanced_transform_model.pth --sample_size 50

Inference

# Generate watermarked text and detect
python sir_watermark.py --mode inference --prompt "Your prompt here" --max_tokens 100

Compare with KGW

# Compare SIR vs KGW (requires MarkLLM)
git clone https://github.com/THU-BPM/MarkLLM.git ../MarkLLM
python sir_watermark.py --mode compare --sample_size 50

Full Pipeline

# Train → Evaluate → Inference
python sir_watermark.py --mode all --epochs 2000

📊 Results

Method	F1 Score	TPR	FPR	Accuracy
KGW	1.0000	1.0000	0.0000	1.0000
SIR Enhanced	0.9495	0.9400	0.0400	0.9500
SIR Baseline	0.9474	0.9000	0.0000	0.9500

Note: KGW performs better in attack-free environments but is vulnerable to text modifications. SIR is more robust against semantic attacks.

⚙️ Options

Option	Description	Default
`--mode`	train / evaluate / inference / compare / all	all
`--model`	baseline / enhanced	enhanced
`--epochs`	Training epochs	2000
`--llm_model`	LLM for generation	facebook/opt-1.3b
`--sample_size`	Evaluation samples	50
`--prompt`	Inference prompt	"Once upon a time..."
`--delta`	Watermark strength	1.0

📚 References

Paper: "A Semantic Invariant Robust Watermark for Large Language Models" (ICLR 2024)
MarkLLM: https://github.com/THU-BPM/MarkLLM

📝 License

This project is for educational and research purposes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SIR Watermark - Semantic Invariant Robust Watermark for LLMs

📚 Reference Repositories

⚠️ Important Note on Implementation Differences

📋 Features

🎥 Demo Presentation Video

⚙️ Environment Setup

Using Conda (Recommended)

Using pip (Alternative)

📦 Required Files

Option 1: Download Pre-trained Models and Embeddings (Recommended)

Step-by-Step Extraction Instructions:

Option 2: Train Models Yourself

🚀 Quick Start

Using Notebooks (Recommended for Submission)

Using Python Script

📁 Project Structure

📓 Notebooks

Group6-01-full.ipynb

Group6-02-demo.ipynb

🔧 Usage

Notebook Usage

Script Usage

Training

Evaluation

Inference

Compare with KGW

Full Pipeline

📊 Results

⚙️ Options

📚 References

📝 License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
compressed_files		compressed_files
data/dataset		data/dataset
.gitignore		.gitignore
Group6-01-full.ipynb		Group6-01-full.ipynb
Group6-02-demo.ipynb		Group6-02-demo.ipynb
README.md		README.md
requirements.txt		requirements.txt
sir_watermark.py		sir_watermark.py

airman-and/NLP_Watermark_Model_Project

Folders and files

Latest commit

History

Repository files navigation

SIR Watermark - Semantic Invariant Robust Watermark for LLMs

📚 Reference Repositories

⚠️ Important Note on Implementation Differences

📋 Features

🎥 Demo Presentation Video

⚙️ Environment Setup

Using Conda (Recommended)

Using pip (Alternative)

📦 Required Files

Option 1: Download Pre-trained Models and Embeddings (Recommended)

Step-by-Step Extraction Instructions:

Option 2: Train Models Yourself

🚀 Quick Start

Using Notebooks (Recommended for Submission)

Using Python Script

📁 Project Structure

📓 Notebooks

Group6-01-full.ipynb

Group6-02-demo.ipynb

🔧 Usage

Notebook Usage

Script Usage

Training

Evaluation

Inference

Compare with KGW

Full Pipeline

📊 Results

⚙️ Options

📚 References

📝 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages