Skip to content

Commit 76b19af

Browse files
committed
Update README: add Llama 3.1 support and GGUF files.
1 parent 0c6b2bb commit 76b19af

File tree

1 file changed

+11
-4
lines changed

1 file changed

+11
-4
lines changed

README.md

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Llama3.java
22

3-
Practical [Llama 3](https://github.com/meta-llama/llama3) inference implemented in a single Java file.
3+
Practical [Llama 3](https://github.com/meta-llama/llama3) and [3.1](https://llama.meta.com/docs/model-cards-and-prompt-formats/llama3_1) inference implemented in a single Java file.
44

55
<p align="center">
66
<img width="700" src="https://github.com/mukel/llama3.java/assets/1896283/7939588c-c0ff-4261-b67f-8a54bad59ab5">
@@ -17,6 +17,7 @@ Besides the educational value, this project will be used to test and tune compil
1717
- [GGUF format](https://github.com/ggerganov/ggml/blob/master/docs/gguf.md) parser
1818
- Llama 3 tokenizer based on [minbpe](https://github.com/karpathy/minbpe)
1919
- Llama 3 inference with Grouped-Query Attention
20+
- Support Llama 3.1 (ad-hoc RoPE scaling)
2021
- Support for Q8_0 and Q4_0 quantizations
2122
- Fast matrix-vector multiplication routines for quantized tensors using Java's [Vector API](https://openjdk.org/jeps/469)
2223
- Simple CLI with `--chat` and `--instruct` modes.
@@ -30,14 +31,20 @@ Here's the interactive `--chat` mode in action:
3031
## Setup
3132

3233
Download pure `Q4_0` and (optionally) `Q8_0` quantized .gguf files from:
34+
https://huggingface.co/mukel/Meta-Llama-3.1-8B-Instruct-GGUF
3335
https://huggingface.co/mukel/Meta-Llama-3-8B-Instruct-GGUF
3436

3537
The `~4.3GB` pure `Q4_0` quantized model is recommended, please be gentle with [huggingface.co](https://huggingface.co) servers:
3638
```
39+
# Llama 3.1
40+
curl -L -O https://huggingface.co/mukel/Meta-Llama-3.1-8B-Instruct-GGUF/resolve/main/Meta-Llama-3.1-8B-Instruct-Q4_0.gguf
41+
42+
# Llama 3
3743
curl -L -O https://huggingface.co/mukel/Meta-Llama-3-8B-Instruct-GGUF/resolve/main/Meta-Llama-3-8B-Instruct-Q4_0.gguf
3844
3945
# Optionally download the Q8_0 quantized model ~8GB
40-
# curl -L -O https://huggingface.co/mukel/Meta-Llama-3-8B-Instruct-GGUF/resolve/main/Meta-Llama-3-8B-Instruct-Q8_0.gguf
46+
# curl -L -O https://huggingface.co/mukel/Meta-Llama-3-8B-Instruct-GGUF/resolve/main/Meta-Llama-3-8B-Instruct-Q8_0.gg
47+
# curl -L -O https://huggingface.co/mukel/Meta-Llama-3.1-8B-Instruct-GGUF/resolve/main/Meta-Llama-3.1-8B-Instruct-Q8_0.gguf
4148
```
4249

4350
#### Optional: quantize to pure `Q4_0` manually
@@ -47,7 +54,7 @@ A **pure** `Q4_0` quantization can be generated from a high precision (F32, F16,
4754
with the `quantize` utility from [llama.cpp](https://github.com/ggerganov/llama.cpp) as follows:
4855

4956
```bash
50-
./quantize --pure ./Meta-Llama-3-8B-Instruct-F32.gguf ./Meta-Llama-3-8B-Instruct-Q4_0.gguf Q4_0
57+
./llama-quantize --pure ./Meta-Llama-3-8B-Instruct-F32.gguf ./Meta-Llama-3-8B-Instruct-Q4_0.gguf Q4_0
5158
```
5259

5360
## Build and run
@@ -75,7 +82,7 @@ java --enable-preview --source 21 --add-modules jdk.incubator.vector LLama3.java
7582
A simple [Makefile](./Makefile) is provided, run `make` to produce `llama3.jar` or manually:
7683
```bash
7784
javac -g --enable-preview -source 21 --add-modules jdk.incubator.vector -d target/classes Llama3.java
78-
jar -cvfe llama3.jar Llama3 LICENSE -C target/classes .
85+
jar -cvfe llama3.jar com.llama4j.Llama3 LICENSE -C target/classes .
7986
```
8087

8188
Run the resulting `llama3.jar` as follows:

0 commit comments

Comments
 (0)