[BUG] Quant Qwen3-Next-80B-A3B-Instruction takes a long time

**Describe the bug**
Quant Qwen3-Next-80B-A3B-Instruction takes a long time
Quantification requires more than 1 day of time，I only used one GPU，
1 Should this 80B model adopt multi GPU quantization？How much VRAM should be used to quantify this 80B model？
2 My GPU is H20 with 96GB of VRAM, but 60GB of VRAM is occupied and only 30GB of VRAM is available for quantization. Is this the reason for slow quantization

gptqmodel==5.0.0

<img width="2510" height="980" alt="Image" src="https://github.com/user-attachments/assets/64385341-6459-41d5-90d4-fcfa18983f5b" />

**GPU Info**

Show output of:

```
nvidia-smi
```

**Software Info**

Operation System/Version + Python Version

Show output of:
```
pip show gptqmodel torch transformers accelerate triton
```

**If you are reporting an inference bug of a post-quantized model, please post the content of `config.json` and `quantize_config.json`.**

**To Reproduce**

How to reproduce this bug if possible.

**Expected behavior**

A clear and concise description of what you expected to happen.

**Model/Datasets**

Make sure your model/dataset is downloadable (on HF for example) so we can reproduce your issue.

**Screenshots**

If applicable, add screenshots to help explain your problem.

**Additional context**

Add any other context about the problem here.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] Quant Qwen3-Next-80B-A3B-Instruction takes a long time #2177

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] Quant Qwen3-Next-80B-A3B-Instruction takes a long time #2177

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions