Skip to content

Conversation

@christiangnrd
Copy link
Member

Close #474

By my very unscientific tests (generating a bunch of values and plotting a histogram), quality seems similar to MPS, but without the NaN generation (#474).

In a future PR, if we really want to default to Apple-provided random number generation when supported, we could wrap the MPSGraph rng functionality but that may incur a performance hit.

@github-actions

This comment was marked as off-topic.

@codecov
Copy link

codecov bot commented Dec 11, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 81.15%. Comparing base (67d668c) to head (c3fe826).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #727      +/-   ##
==========================================
+ Coverage   80.90%   81.15%   +0.25%     
==========================================
  Files          59       62       +3     
  Lines        2896     2892       -4     
==========================================
+ Hits         2343     2347       +4     
+ Misses        553      545       -8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metal Benchmarks

Details
Benchmark suite Current: c3fe826 Previous: 67d668c Ratio
latency/precompile 24862918625 ns 24820843000 ns 1.00
latency/ttfp 2268736334 ns 2257593833 ns 1.00
latency/import 1438339834 ns 1431203750 ns 1.00
integration/metaldevrt 832083 ns 834875 ns 1.00
integration/byval/slices=1 1546583 ns 1525666.5 ns 1.01
integration/byval/slices=3 8758312.5 ns 8498958 ns 1.03
integration/byval/reference 1536791.5 ns 1538166 ns 1.00
integration/byval/slices=2 2611292 ns 2552562 ns 1.02
kernel/indexing 568792 ns 593833 ns 0.96
kernel/indexing_checked 579000 ns 575750 ns 1.01
kernel/launch 11625 ns 11250 ns 1.03
kernel/rand 548834 ns 557187.5 ns 0.99
array/construct 6166 ns 6000 ns 1.03
array/broadcast 585000 ns 591209 ns 0.99
array/random/randn/Float32 991229.5 ns 836917 ns 1.18
array/random/randn!/Float32 723750 ns 619542 ns 1.17
array/random/rand!/Int64 576625 ns 548834 ns 1.05
array/random/rand!/Float32 652750 ns 593333 ns 1.10
array/random/rand/Int64 835500 ns 735667 ns 1.14
array/random/rand/Float32 865021 ns 631792 ns 1.37
array/accumulate/Int64/1d 1241854.5 ns 1237125 ns 1.00
array/accumulate/Int64/dims=1 1799479 ns 1795625 ns 1.00
array/accumulate/Int64/dims=2 2185291.5 ns 2130458 ns 1.03
array/accumulate/Int64/dims=1L 11377104 ns 11609562.5 ns 0.98
array/accumulate/Int64/dims=2L 9754937.5 ns 9610834 ns 1.01
array/accumulate/Float32/1d 1121125 ns 1111187.5 ns 1.01
array/accumulate/Float32/dims=1 1523333 ns 1518146 ns 1.00
array/accumulate/Float32/dims=2 1852250 ns 1836167 ns 1.01
array/accumulate/Float32/dims=1L 9861979 ns 9757375 ns 1.01
array/accumulate/Float32/dims=2L 7196083 ns 7203562.5 ns 1.00
array/reductions/reduce/Int64/1d 1433416 ns 1498333 ns 0.96
array/reductions/reduce/Int64/dims=1 1072292 ns 1076542 ns 1.00
array/reductions/reduce/Int64/dims=2 1111667 ns 1129417 ns 0.98
array/reductions/reduce/Int64/dims=1L 1994917 ns 2002083.5 ns 1.00
array/reductions/reduce/Int64/dims=2L 4212604.5 ns 4214895.5 ns 1.00
array/reductions/reduce/Float32/1d 1006437.5 ns 991375 ns 1.02
array/reductions/reduce/Float32/dims=1 809292 ns 827000 ns 0.98
array/reductions/reduce/Float32/dims=2 842292 ns 833917 ns 1.01
array/reductions/reduce/Float32/dims=1L 1290833 ns 1305125 ns 0.99
array/reductions/reduce/Float32/dims=2L 1793750 ns 1788375 ns 1.00
array/reductions/mapreduce/Int64/1d 1526979 ns 1549292 ns 0.99
array/reductions/mapreduce/Int64/dims=1 1077917 ns 1085333 ns 0.99
array/reductions/mapreduce/Int64/dims=2 1116375 ns 1201959 ns 0.93
array/reductions/mapreduce/Int64/dims=1L 2008000 ns 2019583 ns 0.99
array/reductions/mapreduce/Int64/dims=2L 3606833.5 ns 3628521 ns 0.99
array/reductions/mapreduce/Float32/1d 962250 ns 1036542 ns 0.93
array/reductions/mapreduce/Float32/dims=1 827500 ns 819667 ns 1.01
array/reductions/mapreduce/Float32/dims=2 836708 ns 843917 ns 0.99
array/reductions/mapreduce/Float32/dims=1L 1296187.5 ns 1280500 ns 1.01
array/reductions/mapreduce/Float32/dims=2L 1808292 ns 1784500 ns 1.01
array/private/copyto!/gpu_to_gpu 627583 ns 635375 ns 0.99
array/private/copyto!/cpu_to_gpu 768459 ns 786625 ns 0.98
array/private/copyto!/gpu_to_cpu 784584 ns 773833 ns 1.01
array/private/iteration/findall/int 1572958.5 ns 1620458 ns 0.97
array/private/iteration/findall/bool 1425333.5 ns 1430125 ns 1.00
array/private/iteration/findfirst/int 2096209 ns 2024937.5 ns 1.04
array/private/iteration/findfirst/bool 2024396 ns 2010916 ns 1.01
array/private/iteration/scalar 3959334 ns 5600375 ns 0.71
array/private/iteration/logical 2637208 ns 2504521 ns 1.05
array/private/iteration/findmin/1d 2224729.5 ns 2209917 ns 1.01
array/private/iteration/findmin/2d 1514292 ns 1498584 ns 1.01
array/private/copy 569917 ns 558312.5 ns 1.02
array/shared/copyto!/gpu_to_gpu 83917 ns 82042 ns 1.02
array/shared/copyto!/cpu_to_gpu 81083 ns 79750 ns 1.02
array/shared/copyto!/gpu_to_cpu 81625 ns 82125 ns 0.99
array/shared/iteration/findall/int 1574833.5 ns 1600354 ns 0.98
array/shared/iteration/findall/bool 1427250 ns 1452458 ns 0.98
array/shared/iteration/findfirst/int 1633042 ns 1621520.5 ns 1.01
array/shared/iteration/findfirst/bool 1622396 ns 1607916.5 ns 1.01
array/shared/iteration/scalar 208792 ns 202916 ns 1.03
array/shared/iteration/logical 2241354 ns 2386416.5 ns 0.94
array/shared/iteration/findmin/1d 1823708 ns 1799396 ns 1.01
array/shared/iteration/findmin/2d 1520042 ns 1500416.5 ns 1.01
array/shared/copy 246375 ns 230791 ns 1.07
array/permutedims/4d 2368666 ns 2358000 ns 1.00
array/permutedims/2d 1137229.5 ns 1133208 ns 1.00
array/permutedims/3d 1664041 ns 1645604 ns 1.01
metal/synchronization/stream 18833 ns 18500 ns 1.02
metal/synchronization/context 19709 ns 19625 ns 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@christiangnrd
Copy link
Member Author

There seems to be a bit of a performance hit with uniform rand, maybe we keep using MPS for those since they weren't causing issues?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Metal.randn! produces Nan

2 participants