-
-
Notifications
You must be signed in to change notification settings - Fork 53
Implement support for ndimage.map_coordinates #237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Implementation: pre-map coordinates onto the chunks of 'image' and execute `ndimage.map_coordinates` for every chunk.
|
A good test would be to see how it performs for your use case @jni. |
Yes, this would be very interesting to see |
|
First, my apologies - I keep sitting down to look at this, and then deciding I need to come back later when I feel I can give it enough attention. Then the same thing happens again later. TestsI like the structure of the test validation, comparing the results from scipy.ndimage.map_coordinates with the new Dask implementation. I'd prefer each test case have an explicit assert statement at the end of it, the two actual test functions are not immediately clear with what they're checking. numpy specific codeThere is lots of numpy specific code here. That might cause a separate headache later on, with our goal to implement support for other, non-numpy array libraries (eg: cupy). I don't think that's a reason not to do it, especially if there are groups who would use a map_coordinates feature (Juan, possibly Adam Ginsburg). But I would prefer if we could minimise or eliminate calls to ReadabilityIf you could add comments to the main code function, indicating which parts handle each of the four steps you outline in the first comment on this PR thread, I think that would improve readability. Also on readability, the docstring will need to be fleshed out a bit (but perhaps a bit later, this might be putting the cart before the horse right now). ... I still don't have thoughtful comments on your actual implementation (mostly I haven't fully understood it yet). Will try to sit down and give it another go soon-ish, but no promises! |
|
Just wanted to say that I would find this feature very useful for making https://github.com/astropy/reproject more compatible with dask inputs! |
dask_image/ndinterp/__init__.py
Outdated
| `ndimage.map_coordinates` for every chunk. | ||
| """ | ||
|
|
||
| coordinates = np.asarray(coordinates) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This currently assumes that coordinates is a small enough array that it will fit in memory, but what if both image and coordinates are large arrays and would both not fit in memory? Or is it unlikely that this would be the case?
…ified code. Clarified tests.
|
Hey @GenevieveBuckley, first of all I really appreciate you having a look at this PR and giving your feedback and comments! Second of all apologies back for letting this sit still for quite a while now. I had worked on modifications some months back and am only pushing them now.
Regarding the assert statements, it's probably not super clear that they're performed within
That's a very good point to keep in mind. I eliminated some explicit conversions to numpy arrays, but we'd probably need to rewrite some things when implementing more general support.
Great idea, I added comments explaining the concrete steps that are performed in the code. That should hopefully make things clearer.
Definitely, I'm open for suggestions. Actually, would there currently be a way to use |
Hope the comments and explanations in the code make it clearer. I'm also linking a part of the previous discussion we had here: #235 (comment) |
|
Hey @astrofrog, thanks for your comment :) Yes you're right that this implementation doesn't work with chunked coordinate arrays. Allowing both the image and coordinate arrays to be dask arrays makes the problem much more complex. I guess this is because before the coordinates are present in memory, it's not known which chunks of the image array they need to be mapped onto. See this discussion #235. I'd tend to think that while it's not necessarily safe to assume the coordinate array to be small in all use cases, it's reasonable to assume the coordinate array is at least smaller than the image array. Because in the contrary case, it's probably more efficient to arrange the coordinates on a regular grid or in some other sensible way. Also, this is true for most of my use cases (all?) of What would be your thoughts on this? |
|
For my use case (https://github.com/astropy/reproject), the coordinates might have a similar number of elements as the input array - we are essentially transforming images to different coordinate systems which can involve arbitrary distortions etc. We are currently able to deal with large arrays of coordinates in |
|
Interesting. In the case of having regular coordinates I was thinking that sth like What's also true though is that as you mention it's relatively straight forward to split up the coordinate array and call I guess this could in principle be done using the |
This makes a lot of sense to me. I can't see why it wouldn't work. |
|
Thanks for your comment @jni! I'll try to implement this in the next week(s) - extending this PR's |
chunked coordinate arrays, which requires two compute passes. Adapted tests.
|
Happy New Year!
Finally I pushed an implementation of this, which allows also the I faced a little difficulty worth mentioning here. Following the approach we discussed above, when both
To achieve this I used output = da.map_blocks(
_map_single_coordinates_array_chunk,
delayed(da.Array)(input.dask, input.name, input.chunks, input.dtype),
coordinates,
...)(Edit: I posted a question regarding this on the dask discourse forum). Apart from that, I adapted the tests and the implementation is consistent with the @astrofrog Would you be able to test |
Just adding here that I just came across |
|
Happy new year! I quite like your little hack, it seems like a pretty elegant sidestep around the problem to me. I commented in more detail on the discourse thread. |
|
Just dropping in to say that I am in awe. 😍 Happy New Year indeed! 😃 As a side note the RTD action failure appears to be unrelated to this PR, probably to do with some sphinx/jinja2 update... File "/home/docs/checkouts/readthedocs.org/user_builds/dask-image/conda/237/lib/python3.7/site-packages/sphinx/util/rst.py", line 22, in <module>
from jinja2 import environmentfilter
ImportError: cannot import name 'environmentfilter' from 'jinja2' (/home/docs/checkouts/readthedocs.org/user_builds/dask-image/conda/237/lib/python3.7/site-packages/jinja2/__init__.py) |
|
Thanks for having a closer look at the hack and your comments on the dask discourse @GenevieveBuckley :) So far it seems to be an okay solution.
You're right, pinning |
|
@m-albert - this functionality is a blocker for some of the work that we want to do in reproject, so I am considering whether to vendor this implementation in reproject for now, or whether you think this might be merged in the near future? No worries either way but just wanted to check before proceeding. |
|
fwiw I'm happy for this to be merged as-is. Thanks @m-albert! |
|
@astrofrog Thanks for the ping here! I think we should go ahead with merging this PR and making
On my to do list was:
I experimented with the distributed scheduler and cannot reproduce the memory explosions I observed earlier. For both OOM input and output arrays I think the only sensible scheduler to use in combination with
This would basically mean having somebody review this PR 🤗.
@jakirkham @GenevieveBuckley What do you think? Thank you for your input 🙏🙏 |
|
@jakirkham @GenevieveBuckley - just to check, is this something you would have time to review? This would be very useful for the reproject package, and I'd rather avoid vendoring the code here if at all possible. Thanks! |
| "affine_transform", | ||
| "rotate", | ||
| "spline_filter", | ||
| "spline_filter1d", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this include map_coordinates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. For this function to support GPU we'd need to additionally use the dispatched function in the main function.
Unless you're interested in it I suggest to leave this to a next PR to not convolute things further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@m-albert - just to be clear, I just mean adding map_coordinates to __all__ to expose it in the public API for the package
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy for you to do it
|
fyi Juan sent me a text message a couple of days ago to tell me to take a look at this again, thanks Juan! |
…d by this implementation
|
Very impressive work @m-albert! Really excellent. I have:
Question:
If my understanding is correct, does this mean we should be directing users to run their own prefilter first, using the Presumably that approach would lead to correct output values, matching the output results from scipy more exactly? |
|
I'm going to merge this and try and make a pre-release in the next day or so. If anyone has comments on improving the docstring information on prefilter, let me know (we can always make a separate mini-PR to improve the docs) |
|
fyi there is a dask-image pre-release with map_coordinates now available: https://github.com/dask/dask-image/releases/tag/v2025.11.0rc1 @astrofrog are you able to try that out and let me know if you find any problems before I make the full release? |
|
Thank you! I will test this out today |
|
@GenevieveBuckley It looks like the PyPI upload failed: https://github.com/dask/dask-image/actions/runs/19283237380/job/55138875502 I think you should be able to fix the issue on PyPI and just rerun the deploy step. In the mean time, I'll install from the tag. |
What if |
|
@GenevieveBuckley - I can confirm this works great for reproject (I have a PR at astropy/reproject#532 that adds support for it). I think once this release is out, it would be worth investigating properly implementing |
That's a great idea, I've made an issue with your suggestion here: #419 |
|
Also, release 2015.11.0 is out now and available on pypi, which includes the new map_coordinates function. |
|
Thanks so much for getting this over the finish line @GenevieveBuckley and @astrofrog 🙏🙏 And @jni of course! |

This PR suggests a
dask-imageimplementation forndimage.map_coordinateswhich covers the case in which both the coordinatethe provided coordinates fit into memory(update after #237 (comment)) and theinputarrays consist of a (potentially large) dask array. See #235.Implementation details:
ndimage.map_coordinatesSome tests are included, plan is to add some more following feedback on the choices of the implementation.