Skip to content

llama-cpp-python 0.3.16 - Python 3.14 + CUDA 13.1 GPU Wheel

Latest

Choose a tag to compare

@rookiemann rookiemann released this 18 Dec 01:21
· 1 commit to main since this release
450c44f

First working GPU-accelerated wheel for Python 3.14!

  • Built with CUDA Toolkit 13.1 (Dec 2025)
  • Full layer offload + CUDA graphs
  • Tested: Llama 3 8B Q4_K_M @ ~85 tokens/second on RTX 3090

Install: pip install the attached .whl file
Requires only NVIDIA driver (no toolkit needed at runtime).