Skip to content

Conversation

@mohamed-laarej
Copy link
Collaborator

Description:

This PR resolves issue #756 by updating test files to use NumPy’s modern Generator API and fixing mypy tests

Changes:

  • Replaced legacy np.random functions (np.random.randint, np.random.choice, ...) with rng.integers, rng.choice, etc., across test files.
  • Added global rng = np.random.default_rng(seed=42) in test files, with local rng overrides where needed for reproducibility (simulate_cnv_discordant_read_calls).
  • Resolved mypy errors with type casting in affected files.

Testing:

  • mypy malariagen_data tests --ignore-missing-imports: No issues.
  • pre-commit run --all-files: Passed.
  • poetry run pytest -v tests/anoph: All tests passed.

Notes:

Used NumPy 2.2.4 .
Open to adopting local rng instances if preferred for test isolation.

Thanks for reviewing!

Fixes #756

@jonbrenas
Copy link
Collaborator

Thank you @mohamed-laarej.

I am not sure why you introduced a function ensure_int_list and why you added some castings in sample_metadata as mypy and the tests seem to be fine without them. Do you see errors that I don't?

@mohamed-laarej
Copy link
Collaborator Author

Hi @jonbrenas,

Thank you so much for taking the time to review my pull request and for your feedback! I really appreciate it.

Regarding your questions:

  1. ensure_int_list function: I introduced this helper function to address some mypy errors I encountered locally after updating the NumPy random functions. I first saw errors in tests/anoph/test_h12.py related to the window_sizes variable on the line window_sizes = sorted(set([int(x) for x in window_sizes])) (e.g., Argument 1 to "int" has incompatible type... and Item "int" of ... has no attribute "iter"). My initial attempt to fix this by checking isinstance(window_sizes, int) didn't resolve the mypy errors. When I encountered the exact same error pattern later in tests/anoph/test_g123.py, I created the ensure_int_list function to reliably handle the potentially varied types inferred by mypy for window_sizes and ensure it was processed as a flat list[int]. This resolved the errors I was seeing locally in both files.

  2. Type Castings in sample_metadata.py: Similarly, the cast() calls in this file were added to resolve mypy errors I saw locally. Specifically, I encountered an Incompatible types in assignment (expression has type "int | list[int] | ...", variable has type "list[int]") error for the sample_indices variable assignment on line 896. Adding cast(List[int], ...) resolved this by explicitly telling mypy the expected type from np.nonzero(...).tolist(). The other cast was also added in response to a type error (Type argument "bool" of "dtype"...) I saw during local checks.

I understand that these errors might not be appearing in the CI environment or on your end. It's possible there are slight environment differences, or perhaps other changes resolved the underlying type issues. My intention was just to ensure the code passed mypy checks based on the errors I was seeing locally after the NumPy updates.

If these specific casts and the ensure_int_list function are confirmed to be unnecessary for the tests and mypy to pass in the project's main environment, I am absolutely happy to remove them to keep the code cleaner. Please let me know how you'd like to proceed.

Thanks again for your guidance!

@jonbrenas
Copy link
Collaborator

jonbrenas commented Apr 30, 2025

Thank you @mohamed-laarej . I think both the casts and the additional function are fine. I feel like they clutter the code a bit but not more than other changes that are required so I don't think they are really problematic.

I know they don't raise errors in mypy but I think it would be better to deal with all occurrences of random at the same time. Most other tests also use some random value and I think you missed one of the random.random.randint in test_g123.py.

It would also make sense to unpin the value of numpy to have the tests the newest version (and prove that way that the issue is solved).

@mohamed-laarej
Copy link
Collaborator Author

Thanks @jonbrenas, that makes sense! I'll work on fixing the remaining random.randint in test_g123.py, checking the other test files for legacy random functions, and updating pyproject.toml to unpin NumPy.

Interestingly, my local environment was already using an unpinned NumPy version during my initial work, but I'll ensure the project configuration reflects this now.

I'll push the updates once they're ready.

generator (rng) instead of legacy np.random or Python's random module and unpins the NumPy version in pyproject.toml
mohamed-laarej added a commit to mohamed-laarej/malariagen-data-python that referenced this pull request May 16, 2025
- Replaced all Python random.choice() with rng.choice() for consistency
- Replaced random.sample() with rng.choice(..., replace=False)
- Added .tolist() to convert NumPy arrays to Python lists where needed
- Added str() casting for np.str_ values to ensure Python string compatibility
- Fixed 'low >= high' errors in rng.integers() calls by ensuring high > low
- Specifically fixed tests/anoph/test_frq.py by changing rng.integers(1, len(cohorts_areas)) to rng.integers(1, len(cohorts_areas)+1) to avoid invalid ranges
- Applied int() casting to NumPy integer types where Python int was expected
- Fixed site_mask selection to ensure only valid masks are used for each test context

Addresses feedback from PR malariagen#760 and resolves test failures.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Linting errors connected to numpy 2.2

2 participants