Skip to content

[BUG] Quant Qwen3-Next-80B-A3B-Instruction takes a long time #2177

@xiaotianns

Description

@xiaotianns

Describe the bug
Quant Qwen3-Next-80B-A3B-Instruction takes a long time
Quantification requires more than 1 day of time,I only used one GPU,
1 Should this 80B model adopt multi GPU quantization?How much VRAM should be used to quantify this 80B model?
2 My GPU is H20 with 96GB of VRAM, but 60GB of VRAM is occupied and only 30GB of VRAM is available for quantization. Is this the reason for slow quantization

gptqmodel==5.0.0

Image

GPU Info

Show output of:

nvidia-smi

Software Info

Operation System/Version + Python Version

Show output of:

pip show gptqmodel torch transformers accelerate triton

If you are reporting an inference bug of a post-quantized model, please post the content of config.json and quantize_config.json.

To Reproduce

How to reproduce this bug if possible.

Expected behavior

A clear and concise description of what you expected to happen.

Model/Datasets

Make sure your model/dataset is downloadable (on HF for example) so we can reproduce your issue.

Screenshots

If applicable, add screenshots to help explain your problem.

Additional context

Add any other context about the problem here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions