Frequently Asked Questions on Inference and Deployment of PaddleOCR-VL PaddleOCR-VL 推理部署相关高频问题回复 #16822

Bobholamovic · 2025-10-24T09:29:22Z

Bobholamovic
Oct 24, 2025
Maintainer

Since its release, PaddleOCR-VL has been widely tested by the community, and we have received extensive feedback regarding inference and deployment. To facilitate better usage, this post will address frequently asked questions related to inference deployment in PaddleOCR-VL and will be regularly updated.

dtype mismatch issue when using PaddlePaddle for inference on GPUs with compute capability < 8.5 (e.g., T4, V100)

As of October 24, 2025, the default inference method of PaddleOCR-VL (using PaddlePaddle dynamic graphs) now supports GPUs with compute capability ≥ 7.0. Please follow the official documentation to complete the installation. If you already have PaddleOCR installed locally, you can upgrade the PaddleX version to access the latest features by running the following command:

python -m pip install -U paddlex

Is deployment on hardware from Chinese vendors supported?

The current version mainly supports inference on devices such as x64 CPUs, NVIDIA GPUs, Hygon DCUs, and Baidu Kunlunxin XPUs. Support for hardware such as Huawei NPUs, MetaX GPUs, and Iluvatar GPUs is being prioritized and actively developed. Please stay tuned for future updates.

What are the minimum GPU configuration requirements?

Memory usage and inference speed may vary significantly across different hardware devices, influenced by the total GPU memory and Compute Capability. We have conducted tests on various hardware types, including several consumer-grade graphics cards. Currently, the minimum supported configuration that successfully runs is an RTX 3060 (12 GB), and the lowest supported Compute Capability is 7.0 (e.g., V100).

Occasional OOM issues during inference

The default PaddlePaddle dynamic graph inference mode exhibits significant fluctuations in memory usage, with peak memory consumption potentially being high when processing complex images. For more stable memory usage, it is recommended to deploy using dedicated inference acceleration frameworks such as vLLM or SGLang.

A safetensors error occurred during inference, prompting "framework paddle is invalid"

This issue is usually caused by an incorrect version of the safetensors package installed locally.
Please install the version compatible with the PaddlePaddle framework using the following commands:

# For Linux systems:
python -m pip install https://paddle-whl.bj.bcebos.com/nightly/cu126/safetensors/safetensors-0.6.2.dev0-cp38-abi3-linux_x86_64.whl

# For Windows systems:
python -m pip install https://xly-devops.cdn.bcebos.com/safetensors-nightly/safetensors-0.6.2.dev0-cp38-abi3-win_amd64.whl

Currently, this installation method is not supported on macOS.

High GPU memory usage after starting vLLM / SGLang services

Inference acceleration frameworks like vLLM pre-allocate GPU memory during startup to enhance inference performance. If you need to adjust the size of the pre-allocated memory, you can modify the memory configuration via the --backend_config parameter when starting the service. For example, in vLLM, the gpu-memory-utilization parameter can be used to adjust the proportion of allocated memory relative to the total GPU memory:

paddleocr genai_server --model_name PaddleOCR-VL-0.9B --backend vllm --backend_config <(echo -e 'gpu-memory-utilization: 0.3')

Is deployment on ARM-based CPUs (such as Apple’s M4 chip) supported?

ARM-based CPUs are currently not supported. We will provide updates if support becomes available.

Is deployment on macOS supported?

PaddleOCR-VL currently does not offer native support for macOS. For devices with x64 CPU architectures, deployment can be done using Docker containers. In addition, for the macOS ecosystem, we are evaluating the feasibility of an MLX-VLM–based deployment solution.

PaddleOCR-VL 发布以来，得到了社区的广泛试用，我们也收到了大量关于推理部署方面的使用反馈。为便于大家更好地使用，本帖将集中回应 PaddleOCR-VL 在推理部署中的高频问题，并会持续保持更新。

在 Compute Capability < 8.5 的 GPU（如 T4、V100）上使用 PaddlePaddle 推理时出现 dtype 不匹配的问题

目前（截至 2025 年 10 月 24 日），PaddleOCR-VL 的默认推理方式（使用 PaddlePaddle 动态图）已支持 Compute Capability ≥ 7.0 的 GPU。请按照官方文档指引完成安装。若您已在本地安装 PaddleOCR，可通过以下命令升级 PaddleX 版本，以获取最新功能支持：

python -m pip install -U paddlex

是否支持在国产硬件上部署？

当前版本主要支持在 x64 CPU、NVIDIA GPU、海光 DCU、昆仑芯 XPU 等设备上进行推理。针对华为 NPU、沐曦 GPU、天数 GPU 等硬件的适配工作正在高优推进中，敬请关注后续更新。

最低 GPU 配置要求是什么？

不同硬件设备上的显存占用及推理速度可能存在较大差异，具体表现受显卡总显存及 Compute Capability 影响。我们已在多类硬件（包括多款消费级显卡）中完成测试，目前可成功运行的最小显存配置为 RTX 3060（12 GB），最低支持的 Compute Capability 为 7.0（如 V100）。

推理过程中有时出现 OOM 问题

默认的 PaddlePaddle 动态图推理方式显存占用波动较大，处理复杂图像时峰值显存使用量可能较高。如需获得更稳定的显存占用表现，建议使用 vLLM、SGLang 等专用推理加速框架进行部署。

执行推理时出现 safetensors 错误，提示“framework paddle is invalid”

该问题通常是由于本地安装的 safetensors 版本不正确所致。请执行以下命令安装适用于 PaddlePaddle 框架的版本：

# 对于 Linux 系统，执行：
python -m pip install https://paddle-whl.bj.bcebos.com/nightly/cu126/safetensors/safetensors-0.6.2.dev0-cp38-abi3-linux_x86_64.whl
# 对于Windows 系统，执行：
python -m pip install https://xly-devops.cdn.bcebos.com/safetensors-nightly/safetensors-0.6.2.dev0-cp38-abi3-win_amd64.whl

目前该安装方式暂不支持 macOS。

启动 vLLM / SGLang 服务后显存占用较高

vLLM 等推理加速框架在启动时会预先分配显存，以提升推理性能。如需要调整预分配显存的大小，可在启动服务时通过 --backend_config 参数对显存配置进行修改。以 vLLM 为例，可通过 gpu-memory-utilization 参数调节所分配的显存占 GPU 总显存的比例：

paddleocr genai_server --model_name PaddleOCR-VL-0.9B --backend vllm --backend_config <(echo -e 'gpu-memory-utilization: 0.3')

是否支持 ARM 架构 CPU（例如苹果 M4 芯片）？

目前暂不支持 ARM 架构 CPU。若未来支持，我们会发布相关信息。

是否支持在 macOS 上部署？

目前 PaddleOCR-VL 暂不支持在 macOS 系统上进行原生部署。对于搭载 x64 架构 CPU 的设备，可通过 Docker 容器方式进行部署。同时，针对 macOS 生态，我们也在评估基于 MLX-VLM 的部署方案可行性。

minmie · 2025-10-30T02:25:50Z

minmie
Oct 30, 2025

paddleocr-vl 是否支持npu 推理？

1 reply

tyyykw Nov 10, 2025

不是说了吗？有序推进中，就是不支持啊

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Frequently Asked Questions on Inference and Deployment of PaddleOCR-VL PaddleOCR-VL 推理部署相关高频问题回复 #16822

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Frequently Asked Questions on Inference and Deployment of PaddleOCR-VL PaddleOCR-VL 推理部署相关高频问题回复 #16822

Uh oh!

Uh oh!

Bobholamovic Oct 24, 2025 Maintainer

dtype mismatch issue when using PaddlePaddle for inference on GPUs with compute capability < 8.5 (e.g., T4, V100)

Is deployment on hardware from Chinese vendors supported?

What are the minimum GPU configuration requirements?

Occasional OOM issues during inference

A safetensors error occurred during inference, prompting "framework paddle is invalid"

High GPU memory usage after starting vLLM / SGLang services

Is deployment on ARM-based CPUs (such as Apple’s M4 chip) supported?

Is deployment on macOS supported?

在 Compute Capability < 8.5 的 GPU（如 T4、V100）上使用 PaddlePaddle 推理时出现 dtype 不匹配的问题

是否支持在国产硬件上部署？

最低 GPU 配置要求是什么？

推理过程中有时出现 OOM 问题

执行推理时出现 safetensors 错误，提示“framework paddle is invalid”

启动 vLLM / SGLang 服务后显存占用较高

是否支持 ARM 架构 CPU（例如苹果 M4 芯片）？

是否支持在 macOS 上部署？

Replies: 1 comment · 1 reply

Uh oh!

minmie Oct 30, 2025

Uh oh!

tyyykw Nov 10, 2025

Bobholamovic
Oct 24, 2025
Maintainer

Replies: 1 comment 1 reply

minmie
Oct 30, 2025