A reproduction and enhancement of the paper "A Semantic Invariant Robust (SIR) Watermark for Large Language Models" (ICLR 2024).
This project references the following repositories:
- Robust_Watermark: Original implementation by the paper authors
- MarkLLM: Open-source toolkit for LLM watermarking
While we referenced the above repositories, we found that both implementations differ significantly from the paper's original specifications in several aspects:
- Optimizer: Different optimization functions used
- Parameter Size: Different model parameter dimensions
- Model Architecture: Different architectural choices
- Output Dimension: Different output dimensions
- Other Implementation Details: Various other discrepancies
Due to these differences, we directly designed and implemented the architecture as specified in the paper, ensuring alignment with the original research. Additionally, due to time constraints, we used a subset of the dataset mentioned in the paper for training and evaluation, while maintaining the core methodology and evaluation metrics.
- SIR Baseline: Original TransformModel (Paper settings)
- SIR Enhanced: Improved architecture with LayerNorm, GELU, Dropout
- KGW Comparison: Compare with Kirchenbauer et al.'s Green/Red List watermark
- Full Pipeline: Training → Evaluation → Inference in one script
Demo Presentation Video: Group6-demo-presentation-video.mp4
This video demonstrates our live SIR watermark system, including:
- Real-time training demonstration (1 epoch)
- Watermark generation and detection
- Multiple scenario tests
- Performance evaluation results
Please click the link above to watch our demo presentation.
Presentation Slides: Canva Presentation
Please click the link above to view our presentation slides.
If you are using conda, we recommend creating a dedicated environment named sir:
# Create conda environment with Python 3.8+
conda create -n sir python=3.8 -y
# Activate the environment
conda activate sir
# Install dependencies
pip install -r requirements.txt
# Install Jupyter (required for notebooks)
pip install jupyter notebookNote: After creating the environment, activate it with conda activate sir before running any notebooks or scripts.
Troubleshooting: If conda environment creation fails due to network issues, you can use the base environment (see below).
If you prefer using pip directly without conda, or if conda environment creation fails:
# Install dependencies
pip install -r requirements.txt
# Install Jupyter (required for notebooks)
pip install jupyter notebookNote: If all packages are already installed in your base environment (as shown in the test output), you can skip the installation step and just install Jupyter.
Note: Due to file size limitations, large model and data files are not included in this repository. You have two options:
We provide pre-trained models and embeddings via Google Drive:
📥 Download Link: Google Drive - Models & Embeddings
The Google Drive folder contains:
models.tar.gz(12.7 MB): Pre-trained Baseline and Enhanced modelsembeddings.tar.gz(103.5 MB): Training embeddings (WikiText-103)
-
Download the files from Google Drive:
- Click the link above
- Download both
models.tar.gzandembeddings.tar.gzto your local machine
-
Navigate to the project root directory:
cd /path/to/Group6-SIR(Replace
/path/to/with your actual project path) -
Extract the model files:
# Extract models to the project root (files already have model/ path in archive) tar -xzf models.tar.gz -C .
This will create:
model/enhanced_transform_model.pthmodel/transform_model_cbert_1000.pth
Important: Extract to project root (
.) not tomodel/directory, because the archive already contains themodel/path. -
Extract the embeddings:
# Extract embeddings to the project root (files already have data/embeddings/ path in archive) tar -xzf embeddings.tar.gz -C .
This will create:
data/embeddings/train_embeddings_wikitext.txt
Important: Extract to project root (
.) not todata/embeddings/directory, because the archive already contains thedata/embeddings/path. -
Verify the files:
# Check model files ls -lh model/*.pth # Check embedding file ls -lh data/embeddings/*.txt
Expected file sizes after extraction:
model/enhanced_transform_model.pth: ~8.0 MBmodel/transform_model_cbert_1000.pth: ~5.8 MBdata/embeddings/train_embeddings_wikitext.txt: ~250 MB
If you prefer to train from scratch:
- Train the models: Run
Group6-01-full.ipynbto train both Baseline and Enhanced models (saves tomodel/directory) - Generate embeddings: The training notebook will use embeddings from
data/embeddings/train_embeddings_wikitext.txt. If this file is missing, you'll need to generate it first (see training notebook for details)
Note: The evaluation dataset (data/dataset/c4_train_sample.jsonl) is included in this repository as it's relatively small.
Start Jupyter Notebook:
# If running as root user, use --allow-root flag
jupyter notebook --allow-root
# Or use JupyterLab
jupyter lab --allow-rootThen:
-
Training & Evaluation: Open and run
Group6-01-full.ipynb- All code sections are marked with
# OWN CODEheaders - Detailed English comments explain each component
- Training outputs are preserved for verification
- All code sections are marked with
-
Inference & Demo: Open and run
Group6-02-demo.ipynb- Demonstrates watermark generation and detection
- Includes multiple test scenarios
Important: If using conda, make sure to activate the environment first:
conda activate sirThen proceed with the following:
# 1. Install dependencies (if not already installed)
pip install -r requirements.txt
# 2. Run inference (generate watermarked text and detect)
python sir_watermark.py --mode inference --prompt "Once upon a time"
# 3. Compare SIR with KGW
python sir_watermark.py --mode compare --sample_size 50project/
├── Group6-01-full.ipynb # 🔥 Training & Evaluation notebook (with OWN CODE markers)
├── Group6-02-demo.ipynb # 🔥 Inference & Demo notebook (with OWN CODE markers)
├── sir_watermark.py # Main integrated script (Training/Evaluation/Inference/Compare)
├── requirements.txt # Python dependencies
├── README.md # This file
├── data/
│ ├── embeddings/
│ │ └── train_embeddings_wikitext.txt # Pre-computed embeddings (generate via training notebook)
│ └── dataset/
│ └── c4_train_sample.jsonl # Evaluation dataset (C4) - included
├── model/
│ ├── transform_model_cbert_1000.pth # Pre-trained baseline model (train via Group6-01-full.ipynb)
│ └── enhanced_transform_model.pth # Pre-trained enhanced model (train via Group6-01-full.ipynb)
└── legacy/ # Legacy scripts (deprecated, use sir_watermark.py instead)
├── integrated_training.py
├── integrated_evaluation.py
└── src/ # Legacy source modules (all functionality in sir_watermark.py)
├── models.py
├── utils.py
├── evaluate.py
└── train_enhanced.py
Training & Evaluation Notebook
This notebook contains the complete training and evaluation pipeline:
- OWN CODE: Utility function definitions (cosine similarity, loss function, penalties)
- OWN CODE: Model architecture definitions (Baseline and Enhanced Transform models)
- OWN CODE: Watermark system implementation
- OWN CODE: Model training (2000 epochs with detailed loss logging)
- OWN CODE: Model evaluation with comprehensive metrics (F1, TPR, FPR, Accuracy)
Key Features:
- All code sections marked with
# OWN CODEheaders - Detailed English comments explaining "What" and "Why" for each component
- Training output preserved for verification
- Evaluation metrics comparable to paper results
Inference & Demo Notebook
This notebook demonstrates practical usage scenarios:
- OWN CODE: Watermark generation demo (injecting watermarks into generated text)
- OWN CODE: Watermark detection demo (detecting watermarks in text)
- OWN CODE: Various scenario tests (multiple prompts and situations)
Key Features:
- Interactive demos showing watermark injection and detection
- Comparison between watermarked and unwatermarked text
- Multiple test scenarios for robustness verification
- Clear English documentation throughout
Group6-01-full.ipynb (Training & Evaluation):
- First time: Run all cells sequentially to train models
- Baseline model training (2000 epochs) → saves to
model/transform_model_baseline.pth - Enhanced model training (2000 epochs) → saves to
model/enhanced_transform_model.pth - Training takes several hours depending on hardware
- Baseline model training (2000 epochs) → saves to
- After training: Models are saved and can be reused
- Evaluation will output comprehensive metrics (F1, TPR, FPR, Accuracy)
- All outputs are preserved for submission
Group6-02-demo.ipynb (Inference & Demo):
- Requires trained models: Must run
Group6-01-full.ipynbfirst to generate model files - Demonstrates watermark generation and detection
- Includes multiple test scenarios
# Train enhanced model (2000 epochs)
python sir_watermark.py --mode train --model enhanced --epochs 2000
# Train baseline model
python sir_watermark.py --mode train --model baseline --epochs 2000# Evaluate a trained model
python sir_watermark.py --mode evaluate --model_path model/enhanced_transform_model.pth --sample_size 50# Generate watermarked text and detect
python sir_watermark.py --mode inference --prompt "Your prompt here" --max_tokens 100# Compare SIR vs KGW (requires MarkLLM)
git clone https://github.com/THU-BPM/MarkLLM.git ../MarkLLM
python sir_watermark.py --mode compare --sample_size 50# Train → Evaluate → Inference
python sir_watermark.py --mode all --epochs 2000| Method | F1 Score | TPR | FPR | Accuracy |
|---|---|---|---|---|
| KGW | 1.0000 | 1.0000 | 0.0000 | 1.0000 |
| SIR Enhanced | 0.9495 | 0.9400 | 0.0400 | 0.9500 |
| SIR Baseline | 0.9474 | 0.9000 | 0.0000 | 0.9500 |
Note: KGW performs better in attack-free environments but is vulnerable to text modifications. SIR is more robust against semantic attacks.
| Option | Description | Default |
|---|---|---|
--mode |
train / evaluate / inference / compare / all | all |
--model |
baseline / enhanced | enhanced |
--epochs |
Training epochs | 2000 |
--llm_model |
LLM for generation | facebook/opt-1.3b |
--sample_size |
Evaluation samples | 50 |
--prompt |
Inference prompt | "Once upon a time..." |
--delta |
Watermark strength | 1.0 |
- Paper: "A Semantic Invariant Robust Watermark for Large Language Models" (ICLR 2024)
- MarkLLM: https://github.com/THU-BPM/MarkLLM
This project is for educational and research purposes.