Skip to content

Commit 171bd19

Browse files
committed
Bump version to 0.3.18
Signed-off-by: JamePeng <jame_peng@sina.com>
1 parent d5131e2 commit 171bd19

File tree

2 files changed

+20
-2
lines changed

2 files changed

+20
-2
lines changed

CHANGELOG.md

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,24 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
## [Unreleased]
99

10+
## [0.3.18]
11+
- feat: Update llama.cpp to [ggml-org/llama.cpp/commit/ce734a8a2f9fb6eb4f0383ab1370a1b0014ab787](https://github.com/ggml-org/llama.cpp/commit/ce734a8a2f9fb6eb4f0383ab1370a1b0014ab787)
12+
- feat: Sync llama.cpp llama/mtmd API Binding 20251215
13+
- feat: **implement `GLM46VChatHandler` for GLM-4.6V Series Model**
14+
- feat: **implement `LFM2VLChatHandler` for LFM2-VL series models**
15+
- feat: **implement `GLM41VChatHandler` for GLM-4.1V-9B-Thinking Model**
16+
- workflow: Added workflows for compiling with CUDA 13.0.2 on Windows and Linux.
17+
- feat: Added the scan path for CUDA 13.0+ dynamic link libraries under Windows system ($env:CUDA_PATH\bin\x64)
18+
- Optimization: Improved batch token processing logic in Llava15ChatHandler.
19+
- [perf: optimize LlamaModel.metadata reading performance](https://github.com/JamePeng/llama-cpp-python/commit/8213c19b0e164780ffffa3e64b5fc033cdbe4974)
20+
- Increase initial buffer size to 16KB to eliminate re-allocations for large chat templates.
21+
- Cache ctypes function references to reduce loop overhead.
22+
- Repeated model loading can result in a cumulative speed improvement of 1-3%.
23+
- build: Improve CMakeLists target logic
24+
- refactor: optimize LlamaGrammar class code
25+
26+
More information see: https://github.com/JamePeng/llama-cpp-python/compare/67421d546ddcaa07678ac7921a9f124e7e3de10e...d5131e2ff41e05f83fd847052b06938c7a551a6a
27+
1028
## [0.3.17]
1129
- feat: Update llama.cpp to [ggml-org/llama.cpp/commit/054a45c3d313387a4becd5eae982285932852b35](https://github.com/ggml-org/llama.cpp/commit/054a45c3d313387a4becd5eae982285932852b35)
1230
- feat: Sync llama.cpp llama/mtmd API Binding 20251121
@@ -20,7 +38,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
2038
- feat: Optimize CUDA Wheel Build Workflow, now workflow action support python3.10-3.13 cu124-cu126-cu128 Basic(Non AVX)-AVX2 win-linux
2139

2240

23-
More information see : https://github.com/JamePeng/llama-cpp-python/compare/e5392b52036bd2770ece5269352f5600a8db5639...fbb0ed2f089c663a5eb75aadcad08f768041ed72
41+
More information see: https://github.com/JamePeng/llama-cpp-python/compare/e5392b52036bd2770ece5269352f5600a8db5639...fbb0ed2f089c663a5eb75aadcad08f768041ed72
2442

2543
## [0.3.16]
2644

llama_cpp/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
from .llama_cpp import *
22
from .llama import *
33

4-
__version__ = "0.3.17"
4+
__version__ = "0.3.18"

0 commit comments

Comments
 (0)