You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Increase initial buffer size to 16KB to eliminate re-allocations for large chat templates.
21
+
- Cache ctypes function references to reduce loop overhead.
22
+
- Repeated model loading can result in a cumulative speed improvement of 1-3%.
23
+
- build: Improve CMakeLists target logic
24
+
- refactor: optimize LlamaGrammar class code
25
+
26
+
More information see: https://github.com/JamePeng/llama-cpp-python/compare/67421d546ddcaa07678ac7921a9f124e7e3de10e...d5131e2ff41e05f83fd847052b06938c7a551a6a
27
+
10
28
## [0.3.17]
11
29
- feat: Update llama.cpp to [ggml-org/llama.cpp/commit/054a45c3d313387a4becd5eae982285932852b35](https://github.com/ggml-org/llama.cpp/commit/054a45c3d313387a4becd5eae982285932852b35)
12
30
- feat: Sync llama.cpp llama/mtmd API Binding 20251121
@@ -20,7 +38,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
20
38
- feat: Optimize CUDA Wheel Build Workflow, now workflow action support python3.10-3.13 cu124-cu126-cu128 Basic(Non AVX)-AVX2 win-linux
21
39
22
40
23
-
More information see: https://github.com/JamePeng/llama-cpp-python/compare/e5392b52036bd2770ece5269352f5600a8db5639...fbb0ed2f089c663a5eb75aadcad08f768041ed72
41
+
More information see: https://github.com/JamePeng/llama-cpp-python/compare/e5392b52036bd2770ece5269352f5600a8db5639...fbb0ed2f089c663a5eb75aadcad08f768041ed72
0 commit comments