Skip to content

Conversation

@taimur-10x
Copy link

@taimur-10x taimur-10x commented Dec 5, 2025

Summary

This PR adds repacking and GEMM/GEMV kernels for floating-point (FP16 and FP32).

Key Changes

  • RVV Support is added for F16 and F32 types (with the zvfh extension).

Benchmarking Results

End-to-end benchmarking on BananaPI-BPI F3 (VLEN=256)

Prefill / Prompt Processing (GEMM)

Tokens / Second

Model Prompt Size Repack GEMM (7x32) Vec Dot
Tinyllama F16 1.1B 32 16.72 8.42
Tinyllama F16 1.1B 64 21.55 7.57
Tinyllama F16 1.1B 128 21.2 8.78
Tinyllama F16 1.1B 256 21.82 8.57
Tinyllama F16 1.1B 512 21.81 8.68

Decode (GEMV)

Tokens / Second

Model Decode Size (Prompt=32) Repack GEMV (1x32) Vec Dot
Tinyllama F16 1.1B 10 3.37 3.11
Tinyllama F16 1.1B 16 3.29 3.45
Tinyllama F16 1.1B 32 3.12 3.25
Tinyllama F16 1.1B 64 3.23 3.27
Tinyllama F16 1.1B 100 3.04 3.15
Tinyllama F16 1.1B 128 3.09 3.2
Tinyllama F16 1.1B 256 3.15 3.19

@taimur-10x taimur-10x requested a review from ggerganov as a code owner December 5, 2025 11:22
@taimur-10x taimur-10x marked this pull request as draft December 5, 2025 11:24
@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Dec 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant