-
Notifications
You must be signed in to change notification settings - Fork 15
[feat] FU-level fusion #244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…into hardware_merge
| HardwarePattern(int64_t i, const std::string& n, int64_t f); | ||
| }; | ||
|
|
||
| struct HardwareSlot { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Define/comment HW slot with example. FU in same slot cannot be executed at the same time. Slot is good for which case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added. u can check it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, disabling simultaneous execution in one slot is still unnecessary to me. Slots can also be merged or split.
In this context, I am still hesitate to see "slot" in our definition. I feel it over-complicates the design. Can you try to avoid introducing "slot" to see whether the algorithm still works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with it. Currently I remove the concept "slot" and when mapping we only consider FUs as building blocks of hardware templates. The results are the same as before without "slot".
include/NeuraDialect/NeuraPasses.td
Outdated
| Algorithm: | ||
| 1. Extract pattern DAGs with topological structure from fused_op regions | ||
| 2. Group patterns by structural similarity (same DAG shape) | ||
| 3. For structurally similar patterns, create hardware templates where |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactor "slot" description
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed it.
| //===- HardwareMergePass.cpp - Hardware Template Merging Pass -------------===// | ||
| // | ||
| // This pass maximizes pattern coverage with minimum hardware cost by merging | ||
| // patterns into shared hardware templates. It supports slot bypassing, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"slot bypassing"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed it.
This pass implements the FU-level fusion after DFG-level fusion (PR 194), which aims to find the minimum FU cost that can cover all patterns extracted in previous passes (wrapped in
fused_op).The algorithm can be depicted as below: