Frequently Asked Questions on Inference and Deployment of PaddleOCR-VL PaddleOCR-VL 推理部署相关高频问题回复 #16822
Bobholamovic
announced in
Announcements
Replies: 1 comment 1 reply
-
|
paddleocr-vl 是否支持npu 推理? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Since its release, PaddleOCR-VL has been widely tested by the community, and we have received extensive feedback regarding inference and deployment. To facilitate better usage, this post will address frequently asked questions related to inference deployment in PaddleOCR-VL and will be regularly updated.
dtype mismatch issue when using PaddlePaddle for inference on GPUs with compute capability < 8.5 (e.g., T4, V100)
As of October 24, 2025, the default inference method of PaddleOCR-VL (using PaddlePaddle dynamic graphs) now supports GPUs with compute capability ≥ 7.0. Please follow the official documentation to complete the installation. If you already have PaddleOCR installed locally, you can upgrade the PaddleX version to access the latest features by running the following command:
Is deployment on hardware from Chinese vendors supported?
The current version mainly supports inference on devices such as x64 CPUs, NVIDIA GPUs, Hygon DCUs, and Baidu Kunlunxin XPUs. Support for hardware such as Huawei NPUs, MetaX GPUs, and Iluvatar GPUs is being prioritized and actively developed. Please stay tuned for future updates.
What are the minimum GPU configuration requirements?
Memory usage and inference speed may vary significantly across different hardware devices, influenced by the total GPU memory and Compute Capability. We have conducted tests on various hardware types, including several consumer-grade graphics cards. Currently, the minimum supported configuration that successfully runs is an RTX 3060 (12 GB), and the lowest supported Compute Capability is 7.0 (e.g., V100).
Occasional OOM issues during inference
The default PaddlePaddle dynamic graph inference mode exhibits significant fluctuations in memory usage, with peak memory consumption potentially being high when processing complex images. For more stable memory usage, it is recommended to deploy using dedicated inference acceleration frameworks such as vLLM or SGLang.
A safetensors error occurred during inference, prompting "framework paddle is invalid"
This issue is usually caused by an incorrect version of the safetensors package installed locally.
Please install the version compatible with the PaddlePaddle framework using the following commands:
Currently, this installation method is not supported on macOS.
High GPU memory usage after starting vLLM / SGLang services
Inference acceleration frameworks like vLLM pre-allocate GPU memory during startup to enhance inference performance. If you need to adjust the size of the pre-allocated memory, you can modify the memory configuration via the
--backend_configparameter when starting the service. For example, in vLLM, thegpu-memory-utilizationparameter can be used to adjust the proportion of allocated memory relative to the total GPU memory:paddleocr genai_server --model_name PaddleOCR-VL-0.9B --backend vllm --backend_config <(echo -e 'gpu-memory-utilization: 0.3')Is deployment on ARM-based CPUs (such as Apple’s M4 chip) supported?
ARM-based CPUs are currently not supported. We will provide updates if support becomes available.
Is deployment on macOS supported?
PaddleOCR-VL currently does not offer native support for macOS. For devices with x64 CPU architectures, deployment can be done using Docker containers. In addition, for the macOS ecosystem, we are evaluating the feasibility of an MLX-VLM–based deployment solution.
PaddleOCR-VL 发布以来,得到了社区的广泛试用,我们也收到了大量关于推理部署方面的使用反馈。为便于大家更好地使用,本帖将集中回应 PaddleOCR-VL 在推理部署中的高频问题,并会持续保持更新。
在 Compute Capability < 8.5 的 GPU(如 T4、V100)上使用 PaddlePaddle 推理时出现 dtype 不匹配的问题
目前(截至 2025 年 10 月 24 日),PaddleOCR-VL 的默认推理方式(使用 PaddlePaddle 动态图)已支持 Compute Capability ≥ 7.0 的 GPU。请按照官方文档指引完成安装。若您已在本地安装 PaddleOCR,可通过以下命令升级 PaddleX 版本,以获取最新功能支持:
是否支持在国产硬件上部署?
当前版本主要支持在 x64 CPU、NVIDIA GPU、海光 DCU、昆仑芯 XPU 等设备上进行推理。针对华为 NPU、沐曦 GPU、天数 GPU 等硬件的适配工作正在高优推进中,敬请关注后续更新。
最低 GPU 配置要求是什么?
不同硬件设备上的显存占用及推理速度可能存在较大差异,具体表现受显卡总显存及 Compute Capability 影响。我们已在多类硬件(包括多款消费级显卡)中完成测试,目前可成功运行的最小显存配置为 RTX 3060(12 GB),最低支持的 Compute Capability 为 7.0(如 V100)。
推理过程中有时出现 OOM 问题
默认的 PaddlePaddle 动态图推理方式显存占用波动较大,处理复杂图像时峰值显存使用量可能较高。如需获得更稳定的显存占用表现,建议使用 vLLM、SGLang 等专用推理加速框架进行部署。
执行推理时出现 safetensors 错误,提示“framework paddle is invalid”
该问题通常是由于本地安装的 safetensors 版本不正确所致。请执行以下命令安装适用于 PaddlePaddle 框架的版本:
目前该安装方式暂不支持 macOS。
启动 vLLM / SGLang 服务后显存占用较高
vLLM 等推理加速框架在启动时会预先分配显存,以提升推理性能。如需要调整预分配显存的大小,可在启动服务时通过
--backend_config参数对显存配置进行修改。以 vLLM 为例,可通过gpu-memory-utilization参数调节所分配的显存占 GPU 总显存的比例:paddleocr genai_server --model_name PaddleOCR-VL-0.9B --backend vllm --backend_config <(echo -e 'gpu-memory-utilization: 0.3')是否支持 ARM 架构 CPU(例如苹果 M4 芯片)?
目前暂不支持 ARM 架构 CPU。若未来支持,我们会发布相关信息。
是否支持在 macOS 上部署?
目前 PaddleOCR-VL 暂不支持在 macOS 系统上进行原生部署。对于搭载 x64 架构 CPU 的设备,可通过 Docker 容器方式进行部署。同时,针对 macOS 生态,我们也在评估基于 MLX-VLM 的部署方案可行性。
Beta Was this translation helpful? Give feedback.
All reactions