Skip to content

Conversation

@scottmcm
Copy link
Member

(Inspired by this conversation on Discord https://discord.com/channels/273534239310479360/592856094527848449/1452407229843247197)

This PR removes the Simd<_, _>: Index and Simd<_, _>: IndexMut trait implementations, and instead adds get and set methods.

IMHO this is a good idea as v[1] = 2; looks innocuous but is actually surprisingly bad for codegen, as it forces the vector to be in-memory in order to return the &T -- and there's no way to return a T from Index. One can always v.as_mut_array()[1] = 2; if you want (that still works) but that extra speed bump is, I posit, a good thing to help push people to better methods.

What are those better methods? Well, this also adds Simd::set and Simd::get using the simd_{insert,extract}_dyn intrinsics internally. Thus instead of let x = v[i];, you'd write let x = v.get(i);. Doing that solves the optimization problem that the person was asking about on discord, and seeing that it did fix it (and that there's no use of simd_extract_dyn in the current library) is why I went and made this PR 🙂

Discussion topics:

@scottmcm scottmcm force-pushed the no-index-trait branch 2 times, most recently from 78e6c6b to 5be2886 Compare December 22, 2025 05:13
@workingjubilee
Copy link
Member

Oh, I thought we already added get and set? I guess I forgot.

@scottmcm
Copy link
Member Author

@workingjubilee I was surprised too, TBH :P

I was expecting to say "oh, don't use as_mut_array, use set" and then it didn't exist...

@calebzulawski
Copy link
Member

Looks good to me. I think the _dyn variations of the intrinsics are "new" which is why we didn't originally implement it (we did implement it for masks, since they can't implement Index). Implementing Index was always a bit controversial--I expected the optimizer to handle it but if not, removing it is probably best.

@scottmcm scottmcm force-pushed the no-index-trait branch 2 times, most recently from a35167d to 72cc10d Compare December 22, 2025 05:32
@scottmcm
Copy link
Member Author

I think the _dyn variations of the intrinsics are "new"

They are, but helpfully they have fallback mir so they'll get working implementation for free, and that way so long as they've been in nightly long enough we don't need to worry about things like going and implementing them in cg_clif.

I expected the optimizer to handle it but if not

I'm surprised it doesn't, for what seems like a simple case, but overall I think discouraging address versions of things is good anyway -- we'll still have AsRef and AsMut and as_array and as_mut_array, so we're not blocking anything, just nudging the "wait, this is really not what simd is particularly good at" more strongly.

/// `idx` must be in-bounds (`idx < N`)
#[inline]
unsafe fn get_inner(self, idx: usize) -> T {
// FIXME: This is a workaround for a CI failure in #498
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea what the problem was, but LLVM was very unhappy for some reason

rustc-LLVM ERROR: Cannot select: 0x7ff0c871e2a0: v16i8 = setcc 0x7ff0c871ee00, 0x7ff0ca14b000, setne:ch
  0x7ff0c871ee00: v8i16,ch = CopyFromReg 0x7ff0c871e8c0:1, Register:v8i16 %21
  0x7ff0ca14b000: v8i16 = xor 0x7ff0c8720a80, 0x7ff0c871db60
    0x7ff0c8720a80: v8i16,ch = CopyFromReg 0x7ff0c871e070:1, Register:v8i16 %11
    0x7ff0c871db60: v8i16 = splat_vector Constant:i32<-1>

@DouglasDwyer
Copy link

DouglasDwyer commented Dec 22, 2025

Thanks for following up on our Discord conversation! Is there any chance that Simd::get and Simd::set could be marked as const functions? (Perhaps this would use const_eval_select if the intrinsics are not callable at compile time, or maybe the intrinsics should be const too?)

@programmerjake
Copy link
Member

imo &mut self is fine, we currently pass Simd through memory for all function arguments/returns anyway and llvm is capable of optimizing those out as long as the loads/stores are vector-typed.

@sammysheep
Copy link
Contributor

I couldn't reach the discord for the discussion but we also use the indexing a bit. However, it's usually in a scalar context where we are preparing or inspecting a data structure before or after any hot loop work.

Messing with individual elements at a time always seemed to be for convenience not performance to me, but switching to get/set makes sense if this is a footgun.

Happy to benchmark this branch on Apple M4 if you need more data with a relevant function.

@programmerjake
Copy link
Member

imo &mut self is fine, we currently pass Simd through memory for all function arguments/returns anyway and llvm is capable of optimizing those out as long as the loads/stores are vector-typed.

both passing by value and by &mut Simd produce almost identical LLVM IR: https://rust.godbolt.org/z/T9WdcWh33

@workingjubilee
Copy link
Member

imo &mut self is fine, we currently pass Simd through memory for all function arguments/returns anyway and llvm is capable of optimizing those out as long as the loads/stores are vector-typed.

I would rather we not rely on this optimization pass, as I have been looking into compiler improvements which remove this requirement.

@programmerjake
Copy link
Member

I would rather we not rely on this optimization pass, as I have been looking into compiler improvements which remove this requirement.

by that do you mean changing the function call ABI to pass values in registers, or changing rustc to not store temporary values in allocas, or something else? imo unless both of those changes are made, LLVM still has to optimize vector loads/stores into just SSA values so &mut self seems unlikely to be a problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants