Releases: JamePeng/llama-cpp-python
v0.3.18-cu130-AVX2-win-20251220
Bump version to 0.3.18 Signed-off-by: JamePeng <jame_peng@sina.com>
v0.3.18-cu130-AVX2-linux-20251220
Bump version to 0.3.18 Signed-off-by: JamePeng <jame_peng@sina.com>
v0.3.18-cu128-AVX2-win-20251220
Bump version to 0.3.18 Signed-off-by: JamePeng <jame_peng@sina.com>
v0.3.18-cu128-AVX2-linux-20251220
Bump version to 0.3.18
Changelog here:llama-cpp-python 0.3.18 Changelog
Signed-off-by: JamePeng ( jame_peng@sina.com )
v0.3.18-cu126-AVX2-linux-20251220
Bump version to 0.3.18 Signed-off-by: JamePeng <jame_peng@sina.com>
v0.3.18-cu124-AVX2-linux-20251220
Bump version to 0.3.18
Changelog here:llama-cpp-python 0.3.18 Changelog
Signed-off-by: JamePeng ( jame_peng@sina.com )
v0.3.17-cu130-AVX2-win-20251209
feat: perf: optimize LlamaModel.metadata reading performance
- Increase initial buffer size to 16KB to eliminate re-allocations for large chat templates.
- Cache ctypes function references to reduce loop overhead.
- Repeated model loading can result in a cumulative speed improvement of 1-3%.
feat: Update Submodule vendor/llama.cpp 2fa51c1..6b82eb7
feat: Sync ggml-zendnn : add ZenDNN backend for AMD CPUs
feat: workflow: Added workflows for compiling with CUDA 13.0.2 on Windows and Linux.
feat: feat: Added the scan path for CUDA 13.0+ dynamic link libraries under Windows system ($env:CUDA_PATH\bin\x64)
v0.3.17-cu130-AVX2-linux-20251209
feat: perf: optimize LlamaModel.metadata reading performance
- Increase initial buffer size to 16KB to eliminate re-allocations for large chat templates.
- Cache ctypes function references to reduce loop overhead.
- Repeated model loading can result in a cumulative speed improvement of 1-3%.
feat: Update Submodule vendor/llama.cpp 2fa51c1..6b82eb7
feat: Sync ggml-zendnn : add ZenDNN backend for AMD CPUs
feat: workflow: Added workflows for compiling with CUDA 13.0.2 on Windows and Linux.
feat: feat: Added the scan path for CUDA 13.0+ dynamic link libraries under Windows system ($env:CUDA_PATH\bin\x64)
v0.3.17-cu128-AVX2-win-20251209
feat: perf: optimize LlamaModel.metadata reading performance
- Increase initial buffer size to 16KB to eliminate re-allocations for large chat templates.
- Cache ctypes function references to reduce loop overhead.
- Repeated model loading can result in a cumulative speed improvement of 1-3%.
feat: Update Submodule vendor/llama.cpp 2fa51c1..6b82eb7
feat: Sync ggml-zendnn : add ZenDNN backend for AMD CPUs
feat: workflow: Added workflows for compiling with CUDA 13.0.2 on Windows and Linux.
feat: feat: Added the scan path for CUDA 13.0+ dynamic link libraries under Windows system ($env:CUDA_PATH\bin\x64)
v0.3.17-cu128-AVX2-linux-20251209
feat: perf: optimize LlamaModel.metadata reading performance
- Increase initial buffer size to 16KB to eliminate re-allocations for large chat templates.
- Cache ctypes function references to reduce loop overhead.
- Repeated model loading can result in a cumulative speed improvement of 1-3%.
feat: Update Submodule vendor/llama.cpp 2fa51c1..6b82eb7
feat: Sync ggml-zendnn : add ZenDNN backend for AMD CPUs
feat: workflow: Added workflows for compiling with CUDA 13.0.2 on Windows and Linux.
feat: feat: Added the scan path for CUDA 13.0+ dynamic link libraries under Windows system ($env:CUDA_PATH\bin\x64)