Skip to content

[WIP - NOT READY FOR REVIEW] Paged Attention: rocmlir-gen changes#2222

Open
justinrosner wants to merge 3 commits into42-paged-attention-rocmlirfrom
439-paged-attention-rocmlir-gen
Open

[WIP - NOT READY FOR REVIEW] Paged Attention: rocmlir-gen changes#2222
justinrosner wants to merge 3 commits into42-paged-attention-rocmlirfrom
439-paged-attention-rocmlir-gen

Conversation

@justinrosner
Copy link
Contributor

@justinrosner justinrosner commented Jan 30, 2026

Motivation

This PR adds end-to-end testing infrastructure for paged attention in rocmlir-gen, enabling generation of both GPU kernels and CPU validation functions that properly handle paged K/V caches with shuffled page table addressing.

Implements: https://amd-hub.atlassian.net/browse/AIROCMLIR-439

Technical Details

New command line options:

--paged-attention          # Enable paged attention mode
--num-pages <N>            # Number of pages in the cache
--page-size <N>            # Elements per page

Example Usage:

rocmlir-gen --operation attention -seq_len_q 1024 -seq_len_k 1024 -head_dim_qk 32 -head_dim_v 32 -t f16  --paged-attention --num-pages 32 --page-size 1024

Key Changes:

  1. GPU Kernel Generation
  • For paged attention, K/V inputs become page tables instead of data tensors.
  • The kernel generates:
    • rock.deref ops to create virtual views of paged K/V data
    • Transform chains to reshape [batch, numPages, pageSize] -> [G, seqK, headDim]
    • rock.attention with keyAddresses/valueAddresses pointing to deref outputs
  1. Host Harness Generation: Implements paged cache testing with shuffled page ordering
  2. CPU Validation
  • Validation function receives regular K/V tensors (not page tables), using the logical-order CPU cache for comparison

Test Plan

  • Nightly LIT tests

Test Result

  • Nightly LIT tests

Submission Checklist

@justinrosner justinrosner marked this pull request as ready for review January 30, 2026 18:34
@justinrosner justinrosner requested a review from causten as a code owner January 30, 2026 18:34
@justinrosner justinrosner changed the title [WIP] Paged Attention: rocmlir-gen changes [WIP - NOT READY FOR REVIEW] Paged Attention: rocmlir-gen changes Jan 30, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds paged attention support to rocmlir-gen, a code generation tool for MLIR-based ROCm kernels. Paged attention is an optimization technique that allows attention mechanisms to work with non-contiguous memory pages, improving memory efficiency for large language models.

Changes:

  • Adds command-line options (--paged-attention, --page-size, --num-pages) to enable and configure paged attention mode
  • Modifies attention kernel generation to use page tables (arrays of i64 pointers) instead of direct K/V tensor inputs
  • Implements GPU kernel logic with rock.deref operations to dereference page tables and transform paged data to attention-compatible shapes
  • Adds CPU validation path that reconstructs regular K/V tensors from paged cache for correctness verification
  • Includes comprehensive test coverage with MLIR test file and e2e test configurations

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
mlir/tools/rocmlir-gen/rocmlir-gen.cpp Core implementation: adds paged attention command-line options, validation logic, GPU kernel generation with page table dereferencing and transforms, CPU validation with cache buffer management and shuffling, and host harness logic for page table population
mlir/test/rocmlir-gen/paged-attention-kernel.mlir Comprehensive test file verifying paged attention kernel signature, rock.deref operations, transforms, and validation function with both single-head and GQA configurations
mlir/test/e2e/PrAttentionSchedule.toml Adds e2e test case for paged attention with schedule version 2
mlir/test/e2e/PrAttentionI8.toml Adds e2e test case for paged attention with int8 quantization
mlir/test/e2e/PrAttentionF32.toml Adds e2e test case for paged attention with f32 data type
mlir/test/e2e/PrAttentionF16.toml Adds e2e test case for paged attention with f16 data type
mlir/test/e2e/PrAttentionDirectToLDS.toml Adds e2e test case for paged attention with direct-to-LDS optimization
mlir/test/e2e/PrAttentionBF16.toml Adds e2e test case for paged attention with bf16 data type
mlir/test/e2e/AttentionSchedule.toml Adds e2e test case for paged attention with standard schedule
mlir/test/e2e/AttentionNonPowerOfTwoTileSize.toml Adds e2e test case for paged attention with non-power-of-two tile sizes

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@justinrosner justinrosner force-pushed the 42-paged-attention-rocmlir branch from 9959a7d to fa551da Compare January 30, 2026 21:56
@justinrosner justinrosner force-pushed the 439-paged-attention-rocmlir-gen branch from 3a81a17 to a172ac8 Compare January 30, 2026 22:02
@justinrosner justinrosner force-pushed the 42-paged-attention-rocmlir branch from fa551da to 034180c Compare February 2, 2026 22:14
@justinrosner justinrosner force-pushed the 439-paged-attention-rocmlir-gen branch from a172ac8 to 5f36777 Compare February 2, 2026 22:18
@justinrosner justinrosner force-pushed the 439-paged-attention-rocmlir-gen branch from 5f36777 to dbea8f0 Compare February 2, 2026 22:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant