I am trying to quantize dots_ocr model, whose submodules are ['model','vision_tower','lm_head'], as shown below:
By tracing the code down, I found that this code is for parsing modules for quantization, but it only return one module. By experiment, I can quantize either vision_tower or model separately and loading by vllm, but I can't quantize both due to above mechanism. Please help me figure out how to solve this, thanks^_^