Fully working GPU-accelerated wheel for llama-cpp-python==0.3.16 on Python 3.14 (Windows amd64).
Built December 17, 2025 with:
- CUDA Toolkit 13.1 (latest)
- Full CUDA graph support
- Tested: ~85 tokens/second on Llama 3 8B Q4_K_M (RTX 3090)
pip install llama_cpp_python-0.3.16-cp314-cp314-win_amd64.whl