CUDA boost #40
neocoretechs
started this conversation in
Show and tell
Replies: 1 comment 1 reply
-
|
hello @neocoretechs, did you try the https://github.com/beehive-lab/GPULlama3.java |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I am finishing up custom CUDA kernels and device helpers to boost model runner performance. I am using FFI to call pinpointed crosscuts that move the critical data up, leave it there, process, then move it down with minimal traffic: device helpers: GGUF dequant, sdot. kernels: matmul, softmax, rmsnorm. I am adding a USE_CUDA flag to turn the features on or off. Will provide DLL and so for aarch64 and windoze. Testbed: windoze 11/ Nvidia A2000 CUDA 13, Jetson Orin Nano, CUDA 13 Ubuntu.
Beta Was this translation helpful? Give feedback.
All reactions