Skip to content

Conversation

@christiangnrd
Copy link
Member

Implemented while debugging a proper fix for #719 but this is also a standalone contribution.

Not added to docstrings yet because similar to how there are limitations to mixing different vector sizes in the indexing intrinsics, there are also limitations to mixing intrinsic return types. See Section 5.2.3.6 of the Metal Shading Language Specification for details.

@github-actions

This comment was marked as off-topic.

@codecov
Copy link

codecov bot commented Dec 8, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 81.19%. Comparing base (67d668c) to head (fddfd2e).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #725      +/-   ##
==========================================
+ Coverage   80.90%   81.19%   +0.29%     
==========================================
  Files          59       62       +3     
  Lines        2896     2904       +8     
==========================================
+ Hits         2343     2358      +15     
+ Misses        553      546       -7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metal Benchmarks

Details
Benchmark suite Current: fddfd2e Previous: 67d668c Ratio
latency/precompile 25183586750 ns 24820843000 ns 1.01
latency/ttfp 2300877084 ns 2257593833 ns 1.02
latency/import 1459733584 ns 1431203750 ns 1.02
integration/metaldevrt 842395.5 ns 834875 ns 1.01
integration/byval/slices=1 1580958 ns 1525666.5 ns 1.04
integration/byval/slices=3 19261562.5 ns 8498958 ns 2.27
integration/byval/reference 1578708 ns 1538166 ns 1.03
integration/byval/slices=2 2670750 ns 2552562 ns 1.05
kernel/indexing 472458 ns 593833 ns 0.80
kernel/indexing_checked 502916.5 ns 575750 ns 0.87
kernel/launch 12417 ns 11250 ns 1.10
kernel/rand 532416.5 ns 557187.5 ns 0.96
array/construct 6333 ns 6000 ns 1.06
array/broadcast 561270.5 ns 591209 ns 0.95
array/random/randn/Float32 898292 ns 836917 ns 1.07
array/random/randn!/Float32 594042 ns 619542 ns 0.96
array/random/rand!/Int64 551583 ns 548834 ns 1.01
array/random/rand!/Float32 542354.5 ns 593333 ns 0.91
array/random/rand/Int64 878375 ns 735667 ns 1.19
array/random/rand/Float32 840104.5 ns 631792 ns 1.33
array/accumulate/Int64/1d 1312625 ns 1237125 ns 1.06
array/accumulate/Int64/dims=1 1842812 ns 1795625 ns 1.03
array/accumulate/Int64/dims=2 2248209 ns 2130458 ns 1.06
array/accumulate/Int64/dims=1L 12092500 ns 11609562.5 ns 1.04
array/accumulate/Int64/dims=2L 9940292 ns 9610834 ns 1.03
array/accumulate/Float32/1d 1089125.5 ns 1111187.5 ns 0.98
array/accumulate/Float32/dims=1 1585375 ns 1518146 ns 1.04
array/accumulate/Float32/dims=2 1992375 ns 1836167 ns 1.09
array/accumulate/Float32/dims=1L 10231542 ns 9757375 ns 1.05
array/accumulate/Float32/dims=2L 7492709 ns 7203562.5 ns 1.04
array/reductions/reduce/Int64/1d 1333875 ns 1498333 ns 0.89
array/reductions/reduce/Int64/dims=1 1116938 ns 1076542 ns 1.04
array/reductions/reduce/Int64/dims=2 1151583 ns 1129417 ns 1.02
array/reductions/reduce/Int64/dims=1L 2042312.5 ns 2002083.5 ns 1.02
array/reductions/reduce/Int64/dims=2L 3920000 ns 4214895.5 ns 0.93
array/reductions/reduce/Float32/1d 767125 ns 991375 ns 0.77
array/reductions/reduce/Float32/dims=1 801271 ns 827000 ns 0.97
array/reductions/reduce/Float32/dims=2 822084 ns 833917 ns 0.99
array/reductions/reduce/Float32/dims=1L 1331792 ns 1305125 ns 1.02
array/reductions/reduce/Float32/dims=2L 1801375 ns 1788375 ns 1.01
array/reductions/mapreduce/Int64/1d 1319187.5 ns 1549292 ns 0.85
array/reductions/mapreduce/Int64/dims=1 1114541.5 ns 1085333 ns 1.03
array/reductions/mapreduce/Int64/dims=2 1160354 ns 1201959 ns 0.97
array/reductions/mapreduce/Int64/dims=1L 1990854 ns 2019583 ns 0.99
array/reductions/mapreduce/Int64/dims=2L 3620834 ns 3628521 ns 1.00
array/reductions/mapreduce/Float32/1d 847708 ns 1036542 ns 0.82
array/reductions/mapreduce/Float32/dims=1 796729 ns 819667 ns 0.97
array/reductions/mapreduce/Float32/dims=2 842667 ns 843917 ns 1.00
array/reductions/mapreduce/Float32/dims=1L 1348896 ns 1280500 ns 1.05
array/reductions/mapreduce/Float32/dims=2L 1804458 ns 1784500 ns 1.01
array/private/copyto!/gpu_to_gpu 556583 ns 635375 ns 0.88
array/private/copyto!/cpu_to_gpu 762375 ns 786625 ns 0.97
array/private/copyto!/gpu_to_cpu 700083.5 ns 773833 ns 0.90
array/private/iteration/findall/int 1585979.5 ns 1620458 ns 0.98
array/private/iteration/findall/bool 1478416 ns 1430125 ns 1.03
array/private/iteration/findfirst/int 2072000 ns 2024937.5 ns 1.02
array/private/iteration/findfirst/bool 2033792 ns 2010916 ns 1.01
array/private/iteration/scalar 3401500 ns 5600375 ns 0.61
array/private/iteration/logical 2683958 ns 2504521 ns 1.07
array/private/iteration/findmin/1d 2291938 ns 2209917 ns 1.04
array/private/iteration/findmin/2d 1542792 ns 1498584 ns 1.03
array/private/copy 833875 ns 558312.5 ns 1.49
array/shared/copyto!/gpu_to_gpu 85500 ns 82042 ns 1.04
array/shared/copyto!/cpu_to_gpu 85042 ns 79750 ns 1.07
array/shared/copyto!/gpu_to_cpu 84708 ns 82125 ns 1.03
array/shared/iteration/findall/int 1588895.5 ns 1600354 ns 0.99
array/shared/iteration/findall/bool 1493771 ns 1452458 ns 1.03
array/shared/iteration/findfirst/int 1694812.5 ns 1621520.5 ns 1.05
array/shared/iteration/findfirst/bool 1650687 ns 1607916.5 ns 1.03
array/shared/iteration/scalar 209708 ns 202916 ns 1.03
array/shared/iteration/logical 2301833 ns 2386416.5 ns 0.96
array/shared/iteration/findmin/1d 1902084 ns 1799396 ns 1.06
array/shared/iteration/findmin/2d 1539312.5 ns 1500416.5 ns 1.03
array/shared/copy 214958 ns 230791 ns 0.93
array/permutedims/4d 2471521 ns 2358000 ns 1.05
array/permutedims/2d 1170500 ns 1133208 ns 1.03
array/permutedims/3d 1772458 ns 1645604 ns 1.08
metal/synchronization/stream 19833.5 ns 18500 ns 1.07
metal/synchronization/context 20417 ns 19625 ns 1.04

This comment was automatically generated by workflow using github-action-benchmark.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants