-
Notifications
You must be signed in to change notification settings - Fork 89
Feat:/simulate after local mrca #2396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Feat:/simulate after local mrca #2396
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2396 +/- ##
=======================================
Coverage 91.23% 91.23%
=======================================
Files 20 20
Lines 11831 11847 +16
Branches 2296 2300 +4
=======================================
+ Hits 10794 10809 +15
Misses 568 568
- Partials 469 470 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Implementation looks good, it seems unlikely that it's a memory leak to me. Let's look at the size of the output tables I think. Probably this will be a function of your value of You need to run the C code through clang-format as a minimum to get CI passing. |
|
Note specific requirements for clang-format: https://tskit.dev/tskit/docs/stable/development.html#sec-development-c-requirements |
|
It is ready for review now. Please check:
|
jeromekelleher
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Some minor comment above.
Also needed:
- Some C unit tests. We should be able to add a test or two to test_ancestry.c to cover the functionality. Nothing elaborate, just making sure the code is covered by valgrind.
- We also need to cover the other models. This will require updating msp_merge_ancestors in a similar way. Best to start with algorithms.py on this.
lib/msprime.c
Outdated
| for (avl_node = self->non_empty_populations.head; avl_node != NULL; | ||
| avl_node = avl_node->next) { | ||
| pop_id = (tsk_id_t)(intptr_t) avl_node->item; | ||
| pop_id = (tsk_id_t) (intptr_t) avl_node->item; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to be a bunch of formatting noise here - which version of clang-format are you using?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
version 18.. I will remove this noise
|
I'm going to mute this thread @hossam26644 - please ping me when you'd like a review |
|
@jeromekelleher check now please, and please check the implementation of merge_ancestors. It works and passes the tests that I put.. but I am not sure if I am missing something |
jeromekelleher
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looking good. Needs more models though, in particular the pedigree model. You need to construct a pedigree in which full coalescence will happen and verify that you still go through the rest of the nodes (you'll need store_unary as well I guess, but that's true anyway?)
msprime/ancestry.py
Outdated
|
|
||
| if stop_at_local_mrca is None: | ||
| stop_at_local_mrca = True | ||
| if not stop_at_local_mrca: # when set to False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment is redundant
lib/tests/test_ancestry.c
Outdated
| CU_ASSERT_EQUAL(ret, 0); | ||
|
|
||
| ret = msp_run(&msp, t, ULONG_MAX); | ||
| CU_ASSERT_EQUAL(ret, 2); // exit on max time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the symbolic constant, no need for comment then
26f42eb to
640ff6e
Compare
|
@jeromekelleher
One more thing: I added a condition to stop simulations when there are no events to happen and stop_at_local_mrca is False, i.e., no coalescence, GC, recombination, or migration events; it used to give an error before, since simulations should stop one step before that. This error was raised for smc models only, while for Hudson's model there are always recombination then coalescence events (unless the seq length is 1 base, then it behaves like the smc models). These recombination and coalescence events happen even when the recombination rate is zero (for Hudson). Now, if we can put a condition when simulations get to this infinite cycle of recombination and coalescence, we won't have to specify an end time; simulations can run until more time means nothing (no unary nodes no recombination no coalescence). I don't understand fully why we see this when recombination rate == zero, so maybe I am missing important details; but if think it will be useful, I can remove the requirement for an end time for simulations above the root. |
84de697 to
f9af0ec
Compare
jeromekelleher
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The infinite waiting time problem is in general hard (there's some references to it in the documentation if you look for it). It would be great if we could keep simulating until there are no events for the SMC, but we'll have to be careful not to introduce any unexpected behaviours otherwise.
Re the pedigree model - I don't see why it's not relevant? We can construct a pedigree in which coalescence happens very quickly, but we still want to track how the ancestral material passes through the ancestors. What happens at the moment when partial or full coalescence happens within the pedigree?
|
@jeromekelleher we can go for another round of review:
|
jeromekelleher
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not following some of the logic, but I think I just need to check it out an play with it myself to get my head around it. I think we should just clean this much up, squash the commits and merge. We can register a few follow-ups issues to close before release.
algorithms.py
Outdated
| mig_dest = k | ||
| min_time = min(t_re, t_ca, t_gcin, t_gc_left, t_mig) | ||
| assert min_time != infinity | ||
| assert (min_time != infinity) or ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comparing with the False singleton is an antipattern. Use not self.stop_at_local_mrca
algorithms.py
Outdated
| else: | ||
| self.t += min_time | ||
| if min_time == t_re: | ||
| if (min_time == infinity) and self.stop_at_local_mrca is False: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if (min_time == infinity) and self.stop_at_local_mrca is False: | |
| if (not self.stop_at_local_mrca) and (min_time == infinity): |
Put the constant first so it gets short-circuited. (Not that it matters here, but just as a general principle)
msprime/ancestry.py
Outdated
| """ | ||
|
|
||
|
|
||
| class PotentialInfiniteSimulationhWarning(UserWarning): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| class PotentialInfiniteSimulationhWarning(UserWarning): | |
| class PotentialInfiniteSimulationWarning(UserWarning): |
lib/msprime.c
Outdated
| out: | ||
| msp_safe_free(parents); | ||
| msp_safe_free(segment_mem); | ||
| if (coalescence_occurred == false && recombination_occurred == false && ret == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These out blocks are for error condition and cleanup handling only - move this block up to before the goto. Also you don't need to test for ret == 0 then.
Also, change to if ((!coalescence_occurred)) && (!recombination_occurred))
lib/msprime.c
Outdated
| goto out; | ||
| } | ||
| ret = msp_apply_demographic_events(self, self->next_demographic_event->time); | ||
| migration_occurred |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same point as before about comments on the same line
lib/msprime.c
Outdated
| self->time = cur_time; | ||
| ret = msp_dtwf_generation(self); | ||
| if (ret != 0) { | ||
| if (ret != 0 && ret != MSP_DTWF_DID_NOTHING) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to change the condition to if (ret < 0) .
lib/msprime.c
Outdated
| self->next_sampling_event++; | ||
| } | ||
|
|
||
| if (ret == MSP_DTWF_DID_NOTHING && migration_occurred == false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same style points as above
lib/msprime.c
Outdated
| msp_safe_free(node_trees); | ||
| msp_safe_free(n); | ||
| msp_safe_free(mig_tmp); | ||
| if (ret == MSP_DTWF_DID_NOTHING) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto on putting non cleanup logic in out block
f083429 to
2a250e9
Compare
2a250e9 to
ec37d9f
Compare
|
@hossam26644 just FYI - I started looking into this in detail and it seemed easier to me to fix the stopping condition than to explain why we needed the max_time thing. I've fixed it for Hudson and am filling out the rest of the details now. |
|
Hmm - looks like there's a basic flaw in the logic for the multi-merger case, which wasn't really being tested. I'll need to spend more time on this to get to the bottom of it. |
|
I can invest some time into it. Actually, I would like to. But I am prioritising the paper draft now. If this is not urgent, maybe it can sit for a while until after the new year. |
|
Leave it with me - I'm going to see how far I can get with it today. There will be some follow-up work to do all right though, don't worry! |
239b8bb to
be1b404
Compare
be1b404 to
2ef10be
Compare
|
OK, I think this is basically done. Can you review please @hossam26644? I'm going to log some follow-up issues. |
Here you go: @jeromekelleher
Still missing some tests, and I want to change the flag name of the python algorithm to match the msprime implementation.
As a reminder, this implementation consumes more memory than the when stopping at the youngest root. It could be the node table getting much larger, and could be a memory leak.