·
1 commit
to main
since this release
First working GPU-accelerated wheel for Python 3.14!
- Built with CUDA Toolkit 13.1 (Dec 2025)
- Full layer offload + CUDA graphs
- Tested: Llama 3 8B Q4_K_M @ ~85 tokens/second on RTX 3090
Install: pip install the attached .whl file
Requires only NVIDIA driver (no toolkit needed at runtime).