Skip to content

Conversation

@gongchensu
Copy link
Collaborator

image image image image

@gongchensu gongchensu force-pushed the Issue/846 branch 2 times, most recently from 246fd9c to c715cba Compare December 26, 2025 08:34
…nd __ldg

- Add vectorized memory access using float4/float2, half2, and bfloat162
- Use __ldg instruction for read-only weight and indices access
- Add memory alignment checks to enable vectorized paths
- Add __restrict__ keywords for better compiler optimization
- Implement dynamic block size selection based on embedding_dim
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[DEV] 增加支持图录制的embedding算子

1 participant