-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Use cumsum from flox #10987
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Use cumsum from flox #10987
Conversation
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
|
|
||
| # return result | ||
|
|
||
| actual = apply_ufunc( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this is the way. eventually I'd like the apply_ufunc for reductions to live in Xarray too. So feel free to move that over if it helps. We could put it in flox_compat.py
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
for more information, see https://pre-commit.ci
| keep_attrs: bool | None = None, | ||
| **kwargs: Any, | ||
| ) -> Dataset: | ||
| raise NotImplementedError() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made these changes manually now.
I'm not getting pytest-accept to correctly fix the docstrings in _aggregations.py, it's for example not indenting correctly. I'm not sure if this is just a Windows 10 thing.
| else: | ||
| # TODO: Remove drop_vars when GH6528 is fixed | ||
| # when Dataset.cumsum propagates indexes, and the group variable? | ||
| assert_identical(expected.drop_vars(["x", "group_id"]), actual) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keeping this until min_version of flox is 0.10.5 at least.
Coordinates and docstrings might differ between using flox or not now though. Is that ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not ok imo. I think it might be fixed by simply propagating coordinates in the non-flox branch of the templated code. Might be easy
|
Tests are passing now! But there's a lot of deactivated options left and there was quite a bit of extra code in |
| wrapper, | ||
| obj, | ||
| *codes, | ||
| # input_core_dims=input_core_dims, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need this because we just want the full array forwarded
| kwargs=dict( | ||
| func=func, | ||
| skipna=skipna, | ||
| expected_groups=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be the same as _flox_reduce. This is an important optimization.
| dtype=None, | ||
| method=None, | ||
| engine=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These we should grab from kwargs and forward just like _flox_reduce
| # exclude_dims=set(dim_tuple), | ||
| # output_core_dims=[output_core_dims], | ||
| dask="allowed", | ||
| # dask_gufunc_kwargs=dict( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please delete.
| # for xarray's test_groupby_duplicate_coordinate_labels | ||
| # exclude_dims=set(dim_tuple), | ||
| # output_core_dims=[output_core_dims], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please delete
| ("cumprod", [1.0, 2.0, 6.0, 6.0, 2.0, 2.0]), | ||
| ], | ||
| ) | ||
| def test_resample_cumsum(method: str, expected_array: list[float]) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not use flox for the resample case. It could break catastrophically if we don't support method="blockwise" or "cohorts" in flox.
| assert_identical(expected, actual) | ||
|
|
||
|
|
||
| def test_groupby_cumsum() -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing here is pretty light. We'll want at least two more cases:
- grouping by a single nD variable
- grouping by multiple variables.
whats-new.rstapi.rstThe non-flox version reduces chunksizes significantly:
With flox the chunksize is retained: