[RFC]: Deprecate Legacy Quantization Formats

### Motivation.

* vLLM supports a large variety of quantization formats. This is hard to maintain and makes the codebase complex
* Many mature frameworks (`llm-compressor`, `modelopt`, `quark`, `torchao`) have emerged which are general purpose implementations of various quantization schemes
* we have limited usage of older formats per usage stats

### Proposed Change.

* deprecate many of the legacy formats

Kept:
- compressed-tensors
- quark
- awq.py (to be deprecated later, too many models exist though --- autoawq no longer maintained)
- bitsandbytes.py
- fp8.py
- quark
- mxfp4.py
- modelopt.py
- gguf.py
- gptq.py (to be deprecated later, too many models exists though) --- autogptq no longer maintained)
- torchao.py

Proposed to be removed (per usage stats):
- auto_round
- awq_marlin (consolidate to awq.py)
- awq_triton (consolidate to awq.py)
- bitblas.py
- cpu_wna16.py
- deepseepfp8.py
- experts_int8.py
- fbgemm_fp8.py
- fp_quant.py
- gptq_bitblas.py
- gptq_marlin.py (consolidate to gptq.py)
- gptq_marlin_24.py
- hqq_marlin.py
- inc.py
- input_quant_fp8.py
- ipex_quant.py
- moe_wna16.py
- petit.py
- ptpc_fp8.py
- rtn.py
- tpu_int8.py

### Feedback Period.

2 Weeks

### CC List.

@mgoin @pavanimajety

### Any Other Things.

The goal is to clean up the codebase:
- reduce mental load
- reduce complexity of implementing features (e.g. FusedMoE refactor)

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[RFC]: Deprecate Legacy Quantization Formats #30136

Motivation.

Proposed Change.

Feedback Period.

CC List.

Any Other Things.

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[RFC]: Deprecate Legacy Quantization Formats #30136

Description

Motivation.

Proposed Change.

Feedback Period.

CC List.

Any Other Things.

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions