Skip to content

[AIROCMLIR-44] Update quick-tune lists for gemm and conv#2212

Open
mirza-halilcevic wants to merge 13 commits intodevelopfrom
quick-tune-lists-update
Open

[AIROCMLIR-44] Update quick-tune lists for gemm and conv#2212
mirza-halilcevic wants to merge 13 commits intodevelopfrom
quick-tune-lists-update

Conversation

@mirza-halilcevic
Copy link
Contributor

@mirza-halilcevic mirza-halilcevic commented Jan 23, 2026

Motivation

Update gemm and conv quick-tune lists, based on exhaustive and greedy tuning.

Technical Details

The following architectures have been updated:

  • gfx950
  • gfx942
  • gfx90a
  • gfx908
  • gfx1201
  • gfx1101
  • gfx1150 (based on exhaustive only)

NOTE: The new quick-tune lists ended up being significantly longer in most cases, even 100+ configs for gfx1201 and gfx1101 f16 convolutions.

Test Plan

Compare performance between old and new quick-tune lists and confirm the new ones are giving better performance.

Test Result

gfx950

gemm

> 20% faster: 46 (9.9%)
5-20% faster: 147 (31.5%)
Within 5%: 257 (55.0%)
5-20% slower: 15 (3.2%)
> 20% slower: 2 (0.4%)

conv

> 20% faster: 101 (7.4%)
5-20% faster: 380 (27.9%)
Within 5%: 842 (61.8%)
5-20% slower: 40 (2.9%)
> 20% slower: 0 (0.0%)

gfx942

gemm

> 20% faster: 16 (3.4%)
5-20% faster: 124 (26.6%)
Within 5%: 297 (63.6%)
5-20% slower: 30 (6.4%)
> 20% slower: 0 (0.0%)

conv

> 20% faster: 220 (16.1%)
5-20% faster: 363 (26.6%)
Within 5%: 723 (53.0%)
5-20% slower: 56 (4.1%)
> 20% slower: 1 (0.1%)

gfx90a

gemm

> 20% faster: 30 (6.4%)
5-20% faster: 80 (17.1%)
Within 5%: 325 (69.6%)
5-20% slower: 26 (5.6%)
> 20% slower: 6 (1.3%)

conv

> 20% faster: 165 (12.1%)
5-20% faster: 355 (26.0%)
Within 5%: 789 (57.9%)
5-20% slower: 48 (3.5%)
> 20% slower: 6 (0.4%)

gfx908

gemm

> 20% faster: 15 (3.2%)
5-20% faster: 98 (21.0%)
Within 5%: 283 (60.6%)
5-20% slower: 62 (13.3%)
> 20% slower: 9 (1.9%)

conv

> 20% faster: 115 (8.4%)
5-20% faster: 312 (22.9%)
Within 5%: 895 (65.7%)
5-20% slower: 41 (3.0%)
> 20% slower: 0 (0.0%)

gfx1201

gemm

> 20% faster: 101 (21.6%)
5-20% faster: 75 (16.1%)
Within 5%: 274 (58.7%)
5-20% slower: 17 (3.6%)
> 20% slower: 0 (0.0%)

conv

> 20% faster: 63 (4.6%)
5-20% faster: 120 (8.8%)
Within 5%: 1107 (81.2%)
5-20% slower: 71 (5.2%)
> 20% slower: 2 (0.1%)

gfx1101

gemm

> 20% faster: 13 (2.8%)
5-20% faster: 146 (31.3%)
Within 5%: 288 (61.7%)
5-20% slower: 20 (4.3%)
> 20% slower: 0 (0.0%)

conv

> 20% faster: 137 (10.1%)
5-20% faster: 345 (25.3%)
Within 5%: 784 (57.5%)
5-20% slower: 96 (7.0%)
> 20% slower: 1 (0.1%)

gfx1150

gemm

> 20% faster: 45 (9.6%)
5-20% faster: 105 (22.5%)
Within 5%: 236 (50.5%)
5-20% slower: 56 (12.0%)
> 20% slower: 25 (5.4%)

conv

> 20% faster: 101 (7.4%)
5-20% faster: 260 (19.1%)
Within 5%: 741 (54.4%)
5-20% slower: 180 (13.2%)
> 20% slower: 81 (5.9%)

Submission Checklist

@mirza-halilcevic mirza-halilcevic changed the title Update quick-tune lists for gemm and conv [AIROCMLIR-44] Update quick-tune lists for gemm and conv Jan 23, 2026
Base automatically changed from attn-quick-tune-lists to develop February 3, 2026 19:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant