Add FP8/BF8 support for LDS transpose load#2210
Open
stefankoncarevic wants to merge 7 commits intodevelopfrom
Open
Add FP8/BF8 support for LDS transpose load#2210stefankoncarevic wants to merge 7 commits intodevelopfrom
stefankoncarevic wants to merge 7 commits intodevelopfrom
Conversation
1 task
84c9425 to
59a3f2f
Compare
59a3f2f to
f3176a8
Compare
Implement ds_read_tr8_b64 offset formulas for FP8/BF8 MFMA (16x32, 32x16). Enable mixed fp8/bf8 type combinations for GEMM operations on gfx950.
Disable LDS transpose for FP8 GEMM when K >= 1280 or small square matrices (K == N < 512) to avoid performance regressions while preserving compile time benefits.Add FP8 GEMM heuristic to selectively disable LDS transpose
f3176a8 to
a75ab7a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Implement ds_read_tr8_b64 offset formulas for FP8/BF8 MFMA (16x32, 32x16). Enable mixed fp8/bf8 type combinations for GEMM operations on gfx950.
Motivation
Add FP8 and BF8 data type support for LDS transpose load optimization on gfx950.
This enables efficient matrix loads using
ds_read_tr8_b64hardware instructionTechnical Details
LdsTransposeLoad.cpp: Implemented FP8/BF8 offset formulas ingetBasePanelOffsets()LdsTransposeLoad.cpp: Updated type compatibility check inmakeDecision()areBothFp8Types()check to allow mixed fp8/bf8 combinationsTest Plan
Test Result