-
Notifications
You must be signed in to change notification settings - Fork 55
feat[cartesian]: DaCe bridge refactor: OIR -> TreeIR -> ScheduleTree -> SDFG #2067
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This still needs proper symbols (and thus fails upon call time).
These are translated to `tn.ForScope` and then get mapped to state machine loops in the resulting SDFG. The problems were twofold 1. Wrong bounds in oir to tree IR visitor 2. Memlet propagation missing support for subsets.Indices() Includes a bit of cleanup when creating the `tn.ForScope`.
This is a WIP commit adding support for variable K offset reads. The FieldAccess visitor has been extended to resolve variable K offsets and dace had to be updated to do proper subset intersection (between ranges and indices) in MemletDict and MemletSet (iirc these are new in the stree-to-sdfg branch, so not being super polished makes sense). I've been testing with `test_variable_offsets` from `test_code_generation.py` and this test currently fails SDFG validation with an out of bounds memlet.
Verbose the gtscript func usage in stencil definitions
… for symbols pre-defined. Added temporary stencil to stencil definition + Field2D capacity
oir to tree: introduce ContextPushPop to reduce boilerplate Add a simple while stencil to stencil_definitions
We don't know how to properly "inline" typed temporaries into a Tasklet. Let's make sure that DDE isn't trying to remove access nodes of thins it can't handle later on when patching the Tasklet.
|
Alright - the issue with temporaries is squashed (and worked around where we can't control it). I think this is ready for show. I've tested with pyFV3 and the AI2 data and this validates all major stencils over there in orchestration as well as stencil mode. |
This comment was marked as resolved.
This comment was marked as resolved.
I went over the code of this PR "one last time" and after getting some distance I found a couple small things: - dace_backend: get ranges only for defined axes - oir->tasklet: simplify tasklet generation - oir->tasklet: add note on volume computation for VariableK offsets - oir->tasklet: add doc strings for memlet generation functions - oir->tasklet: fix VariableKOffset memlets (missing offset) - Various typos / language improvements in comments and ADRs
romanc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did "one last code review" resulting in the last commit c34ccc2.
@FlorianDeconinck have a quick look at the changes to memlet range computation when adding fields to the wrapper sdfg (in dace_backend).
Also, I figured, we missed to offset the i/j axes for nodes with VariableKOffset. The K-axis was okay (we add the offset when we build the FieldAccess in the code generation of the Tasklet. This might help with problems that we have seen in physics codes.
…ndexing" This reverts commit f17fd7c.
|
cscs-ci run |
1 similar comment
|
cscs-ci run |
twicki
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great first step!
|
cscs-ci run |
2 similar comments
|
cscs-ci run |
|
cscs-ci run |
PR GridTools/gt4py#2067 in GT4Py fundamentally changed how the `dace:*` backends behave. In that PR, we changed the strategy to make use of an upcoming DaCe feature called "Schedule Tree", which we will use for optimization purposes. This new "bridge" between GT4Py and DaCe, allows for a much cleaner design where both packages handle nothing more than what they need to. A drop in performance is to be expected (especially on CPU) as we have deactivated local caching for now. But the very next task is to re-use this new platform to allow for much more improved and aggressive merging capacities, local caching and hardware-driven tiling. This PR updates GT4Py to a version that includes the above mentioned "Schedule tree bridge" and removes NDSL-level optimizations in the orchestration pipeline. As said, this will come with a temporary dip in performance, which we plan to restore with upcoming pull requests. In addition to updating the GT4Py and DaCe versions, we include the following two changes in this PR 1. Expose compiler optimization level as `GT4PY_COMPILE_OPT_LEVEL`, defaults to `3` as before. 2. Minor change in import style in `ndsl/dsl/dace/orchestration.py`. --------- Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com> Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>
## Description In the latest gt4py/dace bridge refactor (on the cartesian side)[^1], we slipped in a bunch of type enhancements. This PR pull out what is independent of the bridge refactor. In particular this PR brings - generically typed base registry - typed frontend and backend registries - typed `BackendCodegen` constructors - new-style type hints in touched files ## Requirements - [x] All fixes and/or new features come with corresponding tests. Covered by existing tests. - [ ] Important design decisions have been documented in the appropriate ADR inside the [docs/development/ADRs/](docs/development/ADRs/README.md) folder. N/A [^1]: see GridTools#2067 if you are interested
…orkflow (GridTools#2098) ## Description Refactors and cleanups from PR GridTools#2067, because that PR is way too large and we shouldn't bury refactors in there. ## Requirements - [x] All fixes and/or new features come with corresponding tests. New tests are added with this PR. - [ ] Important design decisions have been documented in the appropriate ADR inside the [docs/development/ADRs/](docs/development/ADRs/README.md) folder. N/A --------- Co-authored-by: Enrique González Paredes <enriqueg@cscs.ch>
…end (GridTools#2103) ## Description This PR brings another two cleanup commits: 1. `gtir` doesn't need to be passed around when calling `BackendCodegen` classes. That info can be derived from the backend (class), which is a member of `BackendCodegen`. 2. Use type hints in `dace_backend` (only for the functions that we aren't about to torch anyway with PR GridTools#2067) ## Requirements - [ ] All fixes and/or new features come with corresponding tests. Covered by existing tests. - [ ] Important design decisions have been documented in the appropriate ADR inside the [docs/development/ADRs/](docs/development/ADRs/README.md) folder. N/A
PR GridTools/gt4py#2067 in GT4Py fundamentally changed how the `dace:*` backends behave. In that PR, we changed the strategy to make use of an upcoming DaCe feature called "Schedule Tree", which we will use for optimization purposes. This new "bridge" between GT4Py and DaCe, allows for a much cleaner design where both packages handle nothing more than what they need to. A drop in performance is to be expected (especially on CPU) as we have deactivated local caching for now. But the very next task is to re-use this new platform to allow for much more improved and aggressive merging capacities, local caching and hardware-driven tiling. This PR updates GT4Py to a version that includes the above mentioned "Schedule tree bridge" and removes NDSL-level optimizations in the orchestration pipeline. As said, this will come with a temporary dip in performance, which we plan to restore with upcoming pull requests. In addition to updating the GT4Py and DaCe versions, we include the following two changes in this PR 1. Expose compiler optimization level as `GT4PY_COMPILE_OPT_LEVEL`, defaults to `3` as before. 2. Minor change in import style in `ndsl/dsl/dace/orchestration.py`. --------- Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com> Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>
## Description This is a small follow-up from when we moved to the schedule tree in PR #2067. Looks like we left some test utils. `mypy` isn't complaining about the undefined import paths because we skip it for all of tests and python isn't complaining because the code never runs. ## Requirements - [ ] All fixes and/or new features come with corresponding tests. N/A - [ ] Important design decisions have been documented in the appropriate ADR inside the [docs/development/ADRs/](docs/development/ADRs/README.md) folder. N/A Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>
Description
Current bridge goes from OIR to SDFG unexpanded, then expand to a full SDFG. The bridge has accumulated tech debt, which we have been cleaning up, but barriers to new feature implementation remains.
We have been thinking about leveraging the Schedule Tree representation of DaCe for some time. The idea behind that representation is that we should have an easier time to do schedule related optimizations (e.g. loop re-ordering based on CPU/GPU target and (same-axis) loop merges).
This PR refactors the GT4Py - DaCe bridge to OIR -> TreeIR -> ScheduleTree -> SDFG -> codegen, details in the ADR.
Note, we might might amend a commit or two to fix/change the data layout to map the current loop structure, but no big changes are planned to go into this branch anymore. So we marked it as ready for review.
Requirements