Skip to content

Paged Attention: migraphx & highlevel pipeline changes#2220

Open
justinrosner wants to merge 5 commits intodevelopfrom
46-paged-attention-highlevel
Open

Paged Attention: migraphx & highlevel pipeline changes#2220
justinrosner wants to merge 5 commits intodevelopfrom
46-paged-attention-highlevel

Conversation

@justinrosner
Copy link
Contributor

@justinrosner justinrosner commented Jan 30, 2026

Motivation

This PR introduces migraphx and highlevel pipeline support for paged attention. This implements: https://amd-hub.atlassian.net/browse/AIROCMLIR-46

Technical Details

This PR implements the following changes:

  1. New deref op for paged attention
    • Add migraphx.deref that gets lowered to tosa.custom deref op (MIGraphXToTosa)
    • tosa.custom deref will get lowered to rock.deref (TosaToRock)
  2. Extend rock.attention and rock.gridwise_attention_accel with keyAddresses and valueAddresses.
  3. Add DerefOpInterface for rock.deref to support bufferization (conversion to memrefs).

Note, rock.deref acts as a dererred/lazy load descriptor rather than an immediate load operation. At a high-level it doesn't actually load anything, it declares:

  • What the data shape would be if you dereferenced the pointers
  • How to access it (via addresses)
    The actual memory loads happen much later during tiled/blockwise lowering. Going with this approach allows for us to use the existing pipeline of applying rock.transforms to the K/V input to attention ops in places like SortDimensionsMemoryLayout, etc.

Test Plan

  • PR LIT tests
  • Paged attention IR samples from MIGraphX

Test Result

  • PR LIT tests

Submission Checklist

@justinrosner justinrosner marked this pull request as ready for review January 30, 2026 18:19
@justinrosner justinrosner requested a review from causten as a code owner January 30, 2026 18:19
Copilot AI review requested due to automatic review settings January 30, 2026 18:19
@justinrosner justinrosner changed the title [WIP] Paged Attention: migraphx & highlevel pipeline changes Paged Attention: migraphx & highlevel pipeline changes Jan 30, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request introduces paged attention support for the migraphx and highlevel pipelines. The implementation adds a new deref operation to handle pointer dereferences for paged memory access, and extends the attention operations to support optional key/value addresses for paged attention.

Changes:

  • Introduces migraphx.deref and rock.deref operations for lazy pointer dereferencing in paged attention
  • Extends rock.attention and rock.gridwise_attention_accel with optional keyAddresses and valueAddresses operands
  • Adds conversion patterns from MIGraphX → TOSA → Rock for the deref operation and paged attention support

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
mlir/include/mlir/Dialect/MIGraphX/IR/MIGraphX.td Defines migraphx.deref op with UI64 input constraint
mlir/include/mlir/Dialect/Rock/IR/RockOps.td Defines rock.deref op and extends attention ops with address operands
mlir/include/mlir/Dialect/Rock/IR/RockTosaCustomOps.h Adds ROCK_CUSTOMOP_DEREF constant
mlir/include/mlir/Conversion/TosaToRock/TosaToRock.h Declares populateTosaToRockDerefPatterns function
mlir/lib/Dialect/MIGraphX/IR/MIGraphX.cpp Implements verification for migraphx.deref (shape/stride matching)
mlir/lib/Dialect/Rock/IR/RockDialect.cpp Implements verification for rock.deref (rank-3 constraint) and attention ops (paged attention validation)
mlir/lib/Conversion/MIGraphXToTosa/MIGraphXToTosa.cpp Converts migraphx.deref to tosa.custom deref op
mlir/lib/Conversion/TosaToRock/TosaToRock.cpp Converts tosa.custom deref to rock.deref with pattern matching; detects paged K/V in attention
mlir/lib/Conversion/TosaToRock/TosaToRockPass.cpp Adds deref conversion stage before attention patterns
mlir/lib/Dialect/Rock/Transforms/BufferizableOpInterfaceImpl.cpp Implements bufferization for rock.deref
mlir/lib/Dialect/Rock/Transforms/GemmToGridwise.cpp Threads keyAddresses/valueAddresses through AttentionOp lowering
mlir/lib/Dialect/Rock/Transforms/SortDimensionsMemoryLayout.cpp Threads keyAddresses/valueAddresses through AttentionOp rewrite
mlir/lib/Dialect/Rock/Transforms/DetectFlashDecoding.cpp Threads keyAddresses/valueAddresses through flash decoding pattern
mlir/tools/rocmlir-gen/rocmlir-gen.cpp Updates AttentionOp::create call with nullptr for new operands
mlir/test/Dialect/Rock/ops_error.mlir Updates operandSegmentSizes for two test cases
mlir/test/Dialect/MIGraphX/ops.mlir Adds test for migraphx.deref
mlir/test/Dialect/MIGraphX/invalid.mlir Adds negative tests for migraphx.deref
mlir/test/Conversion/MIGraphXToTosa/mixr-to-tosa-ops.mlir Tests migraphx.deref to tosa.custom conversion
mlir/test/Conversion/TosaToRock/tosa-to-rock-paged-attention.mlir Tests deref and paged attention conversions

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant