Draft
Conversation
If LMUL_X has X > 1, Ara injects one reshuffle at a time for each register within Vn and V(n+X-1) that has an EEW mismatch. All these reshuffles are reshuffling different Vm with LMUL_1, but also the same register (Vn with LMUL_X) from the point of view of the hazard checks on the next instruction that has a dependency on Vn with LMUL_X. We cannot just inject one macro reshuffle since the registers between Vn and V(n+X-1) can have different encodings. So, we need finer-grain reshuffles that messes up the dependency tracking. For example, vst @, v0 (LMUL_8) will use the registers from v0 to v7. If they are all reshuffled, we will end up with 8 reshuffle instructions that will get IDs from 0 to 7. The store will then see a dependency on the reshuffle ID that targets v0 only. This is wrong, since if the store opreq is faster than the slide opreq once the v0-reshuffle is over, it will violate the RAW dependency. Not to mess this up, the safest and most suboptimal fix is to just wait in WAIT_IDLE after a reshuffle with LMUL > 1. There are many possible optimizations to this: 1) Check if, when LMUL > 1, we reshuffled more than 1 register. If we reshuffle 1 reg only, we can also skip the WAIT_IDLE. 2) Check if all the X registers need to be reshuffled (common case). If this is the case, inject a large reshuffle with LMUL_X only and skip WAIT_IDLE. 3) Not to wait until idle, instead of WAIT_IDLE we can inject the reshuffles starting from V(n+X-1) instead than Vn. This will automatically adjust the dependency check and will speed up a bit the whole operation.
Signed-off-by: Moritz Imfeld <moimfeld@student.ethz.ch>
Signed-off-by: Moritz Imfeld <moimfeld@student.ethz.ch>
Closed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description of PR that completes issue here...
Changelog
Fixed
Added
Changed
Checklist
Please check our contributing guidelines before opening a Pull Request.