Merged
Conversation
We need to check the exit code of each joined process in the `run_tasks()` function in `multi_proc.py`. A non-zero exit status (especially a negative exit status) indicates that an error occurred, and it is possible that the output queue (`out_q`) will be empty. Not handling this situation properly, will cause `out_q.get()` (a few lines further) to hang.
tammojan
approved these changes
May 6, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bug description
A deadlock may occur in the function
run_tasks()inmulti_proc.py, ifout_qis empty. This may happen, if one or more of the started processes die unexpectedly.Solution
There are two ways to resolve this problem:
out_qand check if it's empty. If so, some results are apparently missing.In both cases an exception should be raised. The second solution has the advantage that you can more easily access the exit code of the first process that died, and use it in the exception message. This MR therefore implements the second solution.
How to test
To test that this solution works, make sure that one of the underlying processes die with a
SIGSEGV. This can, for example, be accomplished by reverting this change insrc/c++/MGFunction1.cc. Doing so, will raise aSIGSEGVat runtime, when the software is compiled and run with Numpy 2.x.Note
Building the software with Numpy 2.x is not trivial, because on most host systems, all packages are still compiled against Numpy 1.x. Follow the approach used in the CI pipeline https://github.com/lofar-astron/PyBDSF/blob/master/.github/workflows/ci.yml to build the software in a clean docker container.
It may be easier to just raise a signal in the aforementioned C++ file, instead of triggering the
SIGSEGVthrough the incorrect call of a Numpy function (because that requires Numpy 2.x for building the software). The software can then be built on the host system. Just raising the signal should yield the same effect.