-
Notifications
You must be signed in to change notification settings - Fork 48
Use Metal.jl native rand by default #727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This comment was marked as off-topic.
This comment was marked as off-topic.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #727 +/- ##
==========================================
+ Coverage 80.90% 81.15% +0.25%
==========================================
Files 59 62 +3
Lines 2896 2892 -4
==========================================
+ Hits 2343 2347 +4
+ Misses 553 545 -8 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Metal Benchmarks
Details
| Benchmark suite | Current: c3fe826 | Previous: 67d668c | Ratio |
|---|---|---|---|
latency/precompile |
24862918625 ns |
24820843000 ns |
1.00 |
latency/ttfp |
2268736334 ns |
2257593833 ns |
1.00 |
latency/import |
1438339834 ns |
1431203750 ns |
1.00 |
integration/metaldevrt |
832083 ns |
834875 ns |
1.00 |
integration/byval/slices=1 |
1546583 ns |
1525666.5 ns |
1.01 |
integration/byval/slices=3 |
8758312.5 ns |
8498958 ns |
1.03 |
integration/byval/reference |
1536791.5 ns |
1538166 ns |
1.00 |
integration/byval/slices=2 |
2611292 ns |
2552562 ns |
1.02 |
kernel/indexing |
568792 ns |
593833 ns |
0.96 |
kernel/indexing_checked |
579000 ns |
575750 ns |
1.01 |
kernel/launch |
11625 ns |
11250 ns |
1.03 |
kernel/rand |
548834 ns |
557187.5 ns |
0.99 |
array/construct |
6166 ns |
6000 ns |
1.03 |
array/broadcast |
585000 ns |
591209 ns |
0.99 |
array/random/randn/Float32 |
991229.5 ns |
836917 ns |
1.18 |
array/random/randn!/Float32 |
723750 ns |
619542 ns |
1.17 |
array/random/rand!/Int64 |
576625 ns |
548834 ns |
1.05 |
array/random/rand!/Float32 |
652750 ns |
593333 ns |
1.10 |
array/random/rand/Int64 |
835500 ns |
735667 ns |
1.14 |
array/random/rand/Float32 |
865021 ns |
631792 ns |
1.37 |
array/accumulate/Int64/1d |
1241854.5 ns |
1237125 ns |
1.00 |
array/accumulate/Int64/dims=1 |
1799479 ns |
1795625 ns |
1.00 |
array/accumulate/Int64/dims=2 |
2185291.5 ns |
2130458 ns |
1.03 |
array/accumulate/Int64/dims=1L |
11377104 ns |
11609562.5 ns |
0.98 |
array/accumulate/Int64/dims=2L |
9754937.5 ns |
9610834 ns |
1.01 |
array/accumulate/Float32/1d |
1121125 ns |
1111187.5 ns |
1.01 |
array/accumulate/Float32/dims=1 |
1523333 ns |
1518146 ns |
1.00 |
array/accumulate/Float32/dims=2 |
1852250 ns |
1836167 ns |
1.01 |
array/accumulate/Float32/dims=1L |
9861979 ns |
9757375 ns |
1.01 |
array/accumulate/Float32/dims=2L |
7196083 ns |
7203562.5 ns |
1.00 |
array/reductions/reduce/Int64/1d |
1433416 ns |
1498333 ns |
0.96 |
array/reductions/reduce/Int64/dims=1 |
1072292 ns |
1076542 ns |
1.00 |
array/reductions/reduce/Int64/dims=2 |
1111667 ns |
1129417 ns |
0.98 |
array/reductions/reduce/Int64/dims=1L |
1994917 ns |
2002083.5 ns |
1.00 |
array/reductions/reduce/Int64/dims=2L |
4212604.5 ns |
4214895.5 ns |
1.00 |
array/reductions/reduce/Float32/1d |
1006437.5 ns |
991375 ns |
1.02 |
array/reductions/reduce/Float32/dims=1 |
809292 ns |
827000 ns |
0.98 |
array/reductions/reduce/Float32/dims=2 |
842292 ns |
833917 ns |
1.01 |
array/reductions/reduce/Float32/dims=1L |
1290833 ns |
1305125 ns |
0.99 |
array/reductions/reduce/Float32/dims=2L |
1793750 ns |
1788375 ns |
1.00 |
array/reductions/mapreduce/Int64/1d |
1526979 ns |
1549292 ns |
0.99 |
array/reductions/mapreduce/Int64/dims=1 |
1077917 ns |
1085333 ns |
0.99 |
array/reductions/mapreduce/Int64/dims=2 |
1116375 ns |
1201959 ns |
0.93 |
array/reductions/mapreduce/Int64/dims=1L |
2008000 ns |
2019583 ns |
0.99 |
array/reductions/mapreduce/Int64/dims=2L |
3606833.5 ns |
3628521 ns |
0.99 |
array/reductions/mapreduce/Float32/1d |
962250 ns |
1036542 ns |
0.93 |
array/reductions/mapreduce/Float32/dims=1 |
827500 ns |
819667 ns |
1.01 |
array/reductions/mapreduce/Float32/dims=2 |
836708 ns |
843917 ns |
0.99 |
array/reductions/mapreduce/Float32/dims=1L |
1296187.5 ns |
1280500 ns |
1.01 |
array/reductions/mapreduce/Float32/dims=2L |
1808292 ns |
1784500 ns |
1.01 |
array/private/copyto!/gpu_to_gpu |
627583 ns |
635375 ns |
0.99 |
array/private/copyto!/cpu_to_gpu |
768459 ns |
786625 ns |
0.98 |
array/private/copyto!/gpu_to_cpu |
784584 ns |
773833 ns |
1.01 |
array/private/iteration/findall/int |
1572958.5 ns |
1620458 ns |
0.97 |
array/private/iteration/findall/bool |
1425333.5 ns |
1430125 ns |
1.00 |
array/private/iteration/findfirst/int |
2096209 ns |
2024937.5 ns |
1.04 |
array/private/iteration/findfirst/bool |
2024396 ns |
2010916 ns |
1.01 |
array/private/iteration/scalar |
3959334 ns |
5600375 ns |
0.71 |
array/private/iteration/logical |
2637208 ns |
2504521 ns |
1.05 |
array/private/iteration/findmin/1d |
2224729.5 ns |
2209917 ns |
1.01 |
array/private/iteration/findmin/2d |
1514292 ns |
1498584 ns |
1.01 |
array/private/copy |
569917 ns |
558312.5 ns |
1.02 |
array/shared/copyto!/gpu_to_gpu |
83917 ns |
82042 ns |
1.02 |
array/shared/copyto!/cpu_to_gpu |
81083 ns |
79750 ns |
1.02 |
array/shared/copyto!/gpu_to_cpu |
81625 ns |
82125 ns |
0.99 |
array/shared/iteration/findall/int |
1574833.5 ns |
1600354 ns |
0.98 |
array/shared/iteration/findall/bool |
1427250 ns |
1452458 ns |
0.98 |
array/shared/iteration/findfirst/int |
1633042 ns |
1621520.5 ns |
1.01 |
array/shared/iteration/findfirst/bool |
1622396 ns |
1607916.5 ns |
1.01 |
array/shared/iteration/scalar |
208792 ns |
202916 ns |
1.03 |
array/shared/iteration/logical |
2241354 ns |
2386416.5 ns |
0.94 |
array/shared/iteration/findmin/1d |
1823708 ns |
1799396 ns |
1.01 |
array/shared/iteration/findmin/2d |
1520042 ns |
1500416.5 ns |
1.01 |
array/shared/copy |
246375 ns |
230791 ns |
1.07 |
array/permutedims/4d |
2368666 ns |
2358000 ns |
1.00 |
array/permutedims/2d |
1137229.5 ns |
1133208 ns |
1.00 |
array/permutedims/3d |
1664041 ns |
1645604 ns |
1.01 |
metal/synchronization/stream |
18833 ns |
18500 ns |
1.02 |
metal/synchronization/context |
19709 ns |
19625 ns |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
f1262da to
c3fe826
Compare
|
There seems to be a bit of a performance hit with uniform rand, maybe we keep using MPS for those since they weren't causing issues? |
Close #474
By my very unscientific tests (generating a bunch of values and plotting a histogram), quality seems similar to MPS, but without the NaN generation (#474).
In a future PR, if we really want to default to Apple-provided random number generation when supported, we could wrap the MPSGraph rng functionality but that may incur a performance hit.