Skip to content

Conversation

@wooway777
Copy link
Collaborator

resolves #838
image

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for batched RoPE (Rotary Position Embedding) operations in the Cambricon BANG backend by extending the kernel to handle a batch dimension in input tensors.

Key Changes:

  • Extended RoPE kernel to process 4D tensors with batch dimension [batch, seqlen, nhead, dhead]
  • Updated task distribution to handle batch_size × seqlen × nhead workload
  • Added batch stride parameters for input and output tensors

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
src/infiniop/ops/rope/bang/rope_bang_kernel.mlu Updated kernel signature with batch_size and stride parameters; modified task distribution and index calculations to handle batched inputs; added logic to handle position IDs with or without batch dimension
src/infiniop/ops/rope/bang/rope_bang.mlu Updated variable names for clarity (dimx→seqlen, dimy→nhead); added batch_size parameter extraction; updated kernel launch to pass batch dimension and stride parameters

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[DEV] Cambricon Batched RoPE

2 participants