-
Notifications
You must be signed in to change notification settings - Fork 48
Add UInt16 argument intrinsics
#725
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This comment was marked as off-topic.
This comment was marked as off-topic.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #725 +/- ##
==========================================
+ Coverage 80.90% 81.19% +0.29%
==========================================
Files 59 62 +3
Lines 2896 2904 +8
==========================================
+ Hits 2343 2358 +15
+ Misses 553 546 -7 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Metal Benchmarks
Details
| Benchmark suite | Current: fddfd2e | Previous: 67d668c | Ratio |
|---|---|---|---|
latency/precompile |
25183586750 ns |
24820843000 ns |
1.01 |
latency/ttfp |
2300877084 ns |
2257593833 ns |
1.02 |
latency/import |
1459733584 ns |
1431203750 ns |
1.02 |
integration/metaldevrt |
842395.5 ns |
834875 ns |
1.01 |
integration/byval/slices=1 |
1580958 ns |
1525666.5 ns |
1.04 |
integration/byval/slices=3 |
19261562.5 ns |
8498958 ns |
2.27 |
integration/byval/reference |
1578708 ns |
1538166 ns |
1.03 |
integration/byval/slices=2 |
2670750 ns |
2552562 ns |
1.05 |
kernel/indexing |
472458 ns |
593833 ns |
0.80 |
kernel/indexing_checked |
502916.5 ns |
575750 ns |
0.87 |
kernel/launch |
12417 ns |
11250 ns |
1.10 |
kernel/rand |
532416.5 ns |
557187.5 ns |
0.96 |
array/construct |
6333 ns |
6000 ns |
1.06 |
array/broadcast |
561270.5 ns |
591209 ns |
0.95 |
array/random/randn/Float32 |
898292 ns |
836917 ns |
1.07 |
array/random/randn!/Float32 |
594042 ns |
619542 ns |
0.96 |
array/random/rand!/Int64 |
551583 ns |
548834 ns |
1.01 |
array/random/rand!/Float32 |
542354.5 ns |
593333 ns |
0.91 |
array/random/rand/Int64 |
878375 ns |
735667 ns |
1.19 |
array/random/rand/Float32 |
840104.5 ns |
631792 ns |
1.33 |
array/accumulate/Int64/1d |
1312625 ns |
1237125 ns |
1.06 |
array/accumulate/Int64/dims=1 |
1842812 ns |
1795625 ns |
1.03 |
array/accumulate/Int64/dims=2 |
2248209 ns |
2130458 ns |
1.06 |
array/accumulate/Int64/dims=1L |
12092500 ns |
11609562.5 ns |
1.04 |
array/accumulate/Int64/dims=2L |
9940292 ns |
9610834 ns |
1.03 |
array/accumulate/Float32/1d |
1089125.5 ns |
1111187.5 ns |
0.98 |
array/accumulate/Float32/dims=1 |
1585375 ns |
1518146 ns |
1.04 |
array/accumulate/Float32/dims=2 |
1992375 ns |
1836167 ns |
1.09 |
array/accumulate/Float32/dims=1L |
10231542 ns |
9757375 ns |
1.05 |
array/accumulate/Float32/dims=2L |
7492709 ns |
7203562.5 ns |
1.04 |
array/reductions/reduce/Int64/1d |
1333875 ns |
1498333 ns |
0.89 |
array/reductions/reduce/Int64/dims=1 |
1116938 ns |
1076542 ns |
1.04 |
array/reductions/reduce/Int64/dims=2 |
1151583 ns |
1129417 ns |
1.02 |
array/reductions/reduce/Int64/dims=1L |
2042312.5 ns |
2002083.5 ns |
1.02 |
array/reductions/reduce/Int64/dims=2L |
3920000 ns |
4214895.5 ns |
0.93 |
array/reductions/reduce/Float32/1d |
767125 ns |
991375 ns |
0.77 |
array/reductions/reduce/Float32/dims=1 |
801271 ns |
827000 ns |
0.97 |
array/reductions/reduce/Float32/dims=2 |
822084 ns |
833917 ns |
0.99 |
array/reductions/reduce/Float32/dims=1L |
1331792 ns |
1305125 ns |
1.02 |
array/reductions/reduce/Float32/dims=2L |
1801375 ns |
1788375 ns |
1.01 |
array/reductions/mapreduce/Int64/1d |
1319187.5 ns |
1549292 ns |
0.85 |
array/reductions/mapreduce/Int64/dims=1 |
1114541.5 ns |
1085333 ns |
1.03 |
array/reductions/mapreduce/Int64/dims=2 |
1160354 ns |
1201959 ns |
0.97 |
array/reductions/mapreduce/Int64/dims=1L |
1990854 ns |
2019583 ns |
0.99 |
array/reductions/mapreduce/Int64/dims=2L |
3620834 ns |
3628521 ns |
1.00 |
array/reductions/mapreduce/Float32/1d |
847708 ns |
1036542 ns |
0.82 |
array/reductions/mapreduce/Float32/dims=1 |
796729 ns |
819667 ns |
0.97 |
array/reductions/mapreduce/Float32/dims=2 |
842667 ns |
843917 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=1L |
1348896 ns |
1280500 ns |
1.05 |
array/reductions/mapreduce/Float32/dims=2L |
1804458 ns |
1784500 ns |
1.01 |
array/private/copyto!/gpu_to_gpu |
556583 ns |
635375 ns |
0.88 |
array/private/copyto!/cpu_to_gpu |
762375 ns |
786625 ns |
0.97 |
array/private/copyto!/gpu_to_cpu |
700083.5 ns |
773833 ns |
0.90 |
array/private/iteration/findall/int |
1585979.5 ns |
1620458 ns |
0.98 |
array/private/iteration/findall/bool |
1478416 ns |
1430125 ns |
1.03 |
array/private/iteration/findfirst/int |
2072000 ns |
2024937.5 ns |
1.02 |
array/private/iteration/findfirst/bool |
2033792 ns |
2010916 ns |
1.01 |
array/private/iteration/scalar |
3401500 ns |
5600375 ns |
0.61 |
array/private/iteration/logical |
2683958 ns |
2504521 ns |
1.07 |
array/private/iteration/findmin/1d |
2291938 ns |
2209917 ns |
1.04 |
array/private/iteration/findmin/2d |
1542792 ns |
1498584 ns |
1.03 |
array/private/copy |
833875 ns |
558312.5 ns |
1.49 |
array/shared/copyto!/gpu_to_gpu |
85500 ns |
82042 ns |
1.04 |
array/shared/copyto!/cpu_to_gpu |
85042 ns |
79750 ns |
1.07 |
array/shared/copyto!/gpu_to_cpu |
84708 ns |
82125 ns |
1.03 |
array/shared/iteration/findall/int |
1588895.5 ns |
1600354 ns |
0.99 |
array/shared/iteration/findall/bool |
1493771 ns |
1452458 ns |
1.03 |
array/shared/iteration/findfirst/int |
1694812.5 ns |
1621520.5 ns |
1.05 |
array/shared/iteration/findfirst/bool |
1650687 ns |
1607916.5 ns |
1.03 |
array/shared/iteration/scalar |
209708 ns |
202916 ns |
1.03 |
array/shared/iteration/logical |
2301833 ns |
2386416.5 ns |
0.96 |
array/shared/iteration/findmin/1d |
1902084 ns |
1799396 ns |
1.06 |
array/shared/iteration/findmin/2d |
1539312.5 ns |
1500416.5 ns |
1.03 |
array/shared/copy |
214958 ns |
230791 ns |
0.93 |
array/permutedims/4d |
2471521 ns |
2358000 ns |
1.05 |
array/permutedims/2d |
1170500 ns |
1133208 ns |
1.03 |
array/permutedims/3d |
1772458 ns |
1645604 ns |
1.08 |
metal/synchronization/stream |
19833.5 ns |
18500 ns |
1.07 |
metal/synchronization/context |
20417 ns |
19625 ns |
1.04 |
This comment was automatically generated by workflow using github-action-benchmark.
Implemented while debugging a proper fix for #719 but this is also a standalone contribution.
Not added to docstrings yet because similar to how there are limitations to mixing different vector sizes in the indexing intrinsics, there are also limitations to mixing intrinsic return types. See Section 5.2.3.6 of the Metal Shading Language Specification for details.