Skip to content

Conversation

@HobbitQia
Copy link
Collaborator

This pass implements the FU-level fusion after DFG-level fusion (PR 194), which aims to find the minimum FU cost that can cover all patterns extracted in previous passes (wrapped in fused_op).

The algorithm can be depicted as below:

  1. Pattern Extraction: Extracts fused operation patterns from the module and linearizes them via topological sort.
  2. Standalone Operation Extraction: Collects standalone operations not inside fused patterns for hardware coverage.
  3. Template Creation: Greedily merges patterns into shared hardware templates using cost-based accommodation with DFS mapping search.
  4. Connection Generation: Generates optimized slot connections based on pattern dependencies with bypass support.
  5. Execution Plan Generation: Creates parallel execution stages by grouping operations at the same topological level.
  6. JSON Output: Writes hardware configuration including templates, connections, and execution plans to JSON file.

@HobbitQia HobbitQia requested a review from tancheng January 22, 2026 13:34
HardwarePattern(int64_t i, const std::string& n, int64_t f);
};

struct HardwareSlot {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Define/comment HW slot with example. FU in same slot cannot be executed at the same time. Slot is good for which case.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added. u can check it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, disabling simultaneous execution in one slot is still unnecessary to me. Slots can also be merged or split.

In this context, I am still hesitate to see "slot" in our definition. I feel it over-complicates the design. Can you try to avoid introducing "slot" to see whether the algorithm still works?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with it. Currently I remove the concept "slot" and when mapping we only consider FUs as building blocks of hardware templates. The results are the same as before without "slot".

Algorithm:
1. Extract pattern DAGs with topological structure from fused_op regions
2. Group patterns by structural similarity (same DAG shape)
3. For structurally similar patterns, create hardware templates where
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactor "slot" description

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed it.

//===- HardwareMergePass.cpp - Hardware Template Merging Pass -------------===//
//
// This pass maximizes pattern coverage with minimum hardware cost by merging
// patterns into shared hardware templates. It supports slot bypassing,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"slot bypassing"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants