Welcome to the official code repository for "Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMs".
Your star means a lot to us in developing this project! ⭐⭐⭐
- [2025/10/15] 🔥 We release the code for quantizing dLLMs!
- [2025/08/20] 🚀 Our paper is available on arXiv!
-
We present the first systematic study on quantizing diffusion-based language models (dLLMs).
-
This repository implements state-of-the-art post-training quantization (PTQ) methods for dLLM, including GPTQ, AWQ, SmoothQuant, QuaRot, and DuQuant.
-
We comprehensively investigate the impact of quantization on dLLMs across four key dimensions: bit-width, quantization method, task category, and model architecture.
conda create -n qdlm python=3.10 -y
conda activate qdlm
git clone https://github.com/FelixMessi/QDLM
pip install --upgrade qdlm
pip install -r requirements.txt
pip install math-verify==0.8.0 antlr4-python3-runtime==4.11.0 sympy==1.14.0
cd ./lm-evaluation-harness && pip install -e .To run evaluation for QuaRot, please download and install the fast-hadamard-transform with your cuda version.
Please refer to the scripts folder for running different weight-only quantization methods (AWQ, GPTQ) and weight–activation quantization methods (SmoothQuant, QuaRot, DuQuant).
Please download the LLaDA-base/LLaDA-Instruct or Dream models and replace the MODEL_PATH with your specific paths.
Detailed usage instructions are provided in the corresponding shell scripts.
If you have further questions, please open an issue or contact haokun.lin@cripac.ia.ac.cn or xuhb2001@gmail.com.
Discussions and potential collaborations are also welcome.
This repo is built upon the following projects: AutoGPTQ, AWQ, QuaRot, DuQuant, and lm-eval.
We thank the authors for their codes.
Please cite our work if you use our code or discuss our findings in your own research:
@article{lin2025quantization,
title={Quantization meets dllms: A systematic study of post-training quantization for diffusion llms},
author={Lin, Haokun and Xu, Haobo and Wu, Yichen and Guo, Ziyu and Zhang, Renrui and Lu, Zhichao and Wei, Ying and Zhang, Qingfu and Sun, Zhenan},
journal={arXiv preprint arXiv:2508.14896},
year={2025}
}Explore our additional research on Post-training Quantization and Network Pruning:
- [DuQuant] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs
- [IntactKV] IntactKV: Improving Large Language Model Quantization by Keeping Pivot Tokens Intact
- [LRQ-DiT] LRQ-DiT: Log-Rotation Post-Training Quantization of Diffusion Transformers for Image and Video Generation
- [DopQ-ViT] DopQ-ViT: Towards Distribution-Friendly and Outlier-Aware Post-Training Quantization for Vision Transformers
- [RIA] Plug-and-Play: An Efficient Post-training Pruning Method for Large Language Models
- [MoPE-CLIP] MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with Module-wise Pruning Error Metric