Skip to content

Releases: rookiemann/llama-cpp-python-py314-cuda131-wheel-or-python314-llama-cpp-gpu-wheel

llama-cpp-python 0.3.16 - Python 3.14 + CUDA 13.1 GPU Wheel

18 Dec 01:21
450c44f

Choose a tag to compare

First working GPU-accelerated wheel for Python 3.14!

  • Built with CUDA Toolkit 13.1 (Dec 2025)
  • Full layer offload + CUDA graphs
  • Tested: Llama 3 8B Q4_K_M @ ~85 tokens/second on RTX 3090

Install: pip install the attached .whl file
Requires only NVIDIA driver (no toolkit needed at runtime).