Skip to content

Commit e6c1136

Browse files
jeromekelleherhossam26644
authored andcommitted
Add tests for fixed pedigree and clarify semantics; document
Closes #2157 Closes #1898
1 parent 0dbfda9 commit e6c1136

File tree

11 files changed

+197
-145
lines changed

11 files changed

+197
-145
lines changed

CHANGELOG.md

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,13 +7,10 @@
77

88
- The `FixedPedigree` simulation model now supports internal samples ({issue}`1855`,
99
{pr}`2321`, {pr}`2326`, {pr}`2388`, {user}`abureau`, {user}`jeromekelleher`).
10-
1110
- Add support and wheels for Python3.13
1211
- Support for simulations after local mrca,
13-
({pr}`2396`, {user}`jeromekelleher`, {user}`hossam26644`).
14-
12+
({issue}`2157`, {pr}`2396`, {user}`jeromekelleher`, {user}`hossam26644`).
1513
- Drop Python 3.9 support, require Python >= 3.10 ({pr}`2418`, {user}`benjeffery`)
16-
1714
- Add wheels on Windows ({pr}`2414`, {issue}`2200`,{user}`benjeffery`)
1815

1916
**Breaking changes**:

algorithms.py

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1541,10 +1541,8 @@ def dtwf_simulate(self, end_time):
15411541
def dtwf_generation(self):
15421542
"""
15431543
Evolves one generation of a Wright Fisher population
1544-
Returns True if any events occurred, False otherwise.
15451544
"""
15461545
# Migration events happen at the rates in the matrix.
1547-
15481546
for j in range(len(self.P)):
15491547
source_size = self.P[j].get_num_ancestors()
15501548
for k in range(len(self.P)):
@@ -2263,7 +2261,6 @@ def merge_ancestors(self, H, pop_id, label, new_node_id=-1):
22632261
if r_max not in self.S:
22642262
j = self.S.floor_key(r_max)
22652263
self.S[r_max] = self.S[j]
2266-
22672264
min_overlap = len(X) if self.stop_at_local_mrca else 0
22682265
# Update the number of extant segments.
22692266
if self.S[left] == min_overlap:

docs/ancestry.md

Lines changed: 44 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1481,17 +1481,16 @@ Migrations nodes are also recorded in the ARG using the
14811481
`NodeType.MIGRANT` flag. See the {ref}`sec_api_node_flags`
14821482
section for more details.
14831483

1484-
(sec_simulation_after_mrca)=
1484+
(sec_ancestry_stop_at_local_mrca)=
14851485

1486-
### Simulations after local MRCA
1486+
### Simulating ancestral to local MRCAs
14871487

14881488
By default, msprime stops simulating local trees when a local most recent
14891489
common ancestor (MRCA) is found. This is because events that occur above the
14901490
common ancestor are shared across all samples, and we are usually interested in
1491-
differences between samples.
1492-
1493-
However, for some specialised applications, simulations after the local MRCA
1494-
might be needed. In this case, we set the parameter `stop_at_local_mrca` of
1491+
differences between samples. However, it is sometime useful to simulate ancestry
1492+
across the full span of the region until **all** local MRCAs have been reached.
1493+
To do this, we set the parameter `stop_at_local_mrca` of
14951494
{func}`.sim_ancestry` to `False`.
14961495

14971496
```{code-cell}
@@ -1508,15 +1507,10 @@ because 11 is the root of that tree.
15081507

15091508
The stopping condition remains the same---we keep simulating until an MRCA has been
15101509
found at *all* local trees. The interpretation of the the final "root" nodes
1511-
is a little subtle
1512-
1513-
:::{note}
1514-
When using the ``stop_at_local_mrca=False`` option you usually need to specify an
1515-
``end_time`` argument (see {ref}`sec_ancestry_end_time`) or the simulation will
1516-
run indefinitely! It is in general not possible to know what the "right" end
1517-
time for a given simulation is, and you may need to specify a "large" value to be
1518-
sure that all trees have coalesced locally.
1519-
:::
1510+
across the trees is a little subtle, and follows the same logic as used when
1511+
{ref}`sec_ancestry_end_time`. All of the trees have a root node with time
1512+
equal to the oldest MRCA, and the nodes correspond to different lineages present
1513+
in the simulation.
15201514

15211515

15221516
## Manipulating simulation time
@@ -1532,7 +1526,7 @@ for two diploid samples:
15321526

15331527
```{code-cell}
15341528
ts = msprime.sim_ancestry(2, sequence_length=10, recombination_rate=0.1, random_seed=42)
1535-
SVG(ts.draw_svg(y_axis=True, y_ticks=[0, 1, 2, 3, 4, 5]))
1529+
SVG(ts.draw_svg(y_axis=True, y_ticks=[0, 1, 2, 3, 4]))
15361530
```
15371531

15381532
Sometimes we would like to stop the simulation early, **before** complete
@@ -2819,6 +2813,39 @@ Because the input pedigree fully describes the simulation many features such as
28192813
combined with other {ref}`ancestry models<sec_ancestry_models>`.
28202814
:::
28212815

2816+
(sec_ancestry_models_fixed_pedigree_tracing)=
2817+
2818+
#### Tracing ancestry through a pedigree
2819+
2820+
The previous example generated trees that are embedded within the pedigree,
2821+
but doesn't tell us directly the exact path that each sample's genetic
2822+
material took through the ancestors. To trace these paths exactly, we can
2823+
use a combination of the ``additional_nodes``,
2824+
``coalescing_segments_only``
2825+
(see {ref}`sec_ancestry_additional_nodes`),
2826+
and ``stop_at_local_mrca`` (see {ref}`sec_ancestry_stop_at_local_mrca`).
2827+
2828+
```{code-cell}
2829+
ped_ts = msprime.sim_ancestry(
2830+
initial_state=pedigree,
2831+
model="fixed_pedigree",
2832+
random_seed=41,
2833+
recombination_rate=0.001,
2834+
coalescing_segments_only=False,
2835+
additional_nodes=(
2836+
msprime.NodeType.COMMON_ANCESTOR |
2837+
msprime.NodeType.PASS_THROUGH),
2838+
stop_at_local_mrca=True
2839+
)
2840+
node_labels = {node.id: f"{node.individual}({node.id})" for node in ped_ts.nodes()}
2841+
SVG(ped_ts.draw_svg(y_axis=True, node_labels=node_labels, size=(600,200)))
2842+
```
2843+
2844+
We can now see that the ancestor of node 15 is 20, which we would have to
2845+
infer in the previous example.
2846+
2847+
2848+
28222849
#### Censoring pedigree information
28232850

28242851
:::{warning}
@@ -2922,6 +2949,7 @@ when calculating population sizes. Thus, the pedigree must be seen as
29222949
entirely **external** to the population model simulated during recapitation.
29232950
:::
29242951

2952+
29252953
(sec_ancestry_models_fixed_pedigree_demography)=
29262954

29272955
#### Pedigrees and demography

docs/api.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -232,12 +232,12 @@ nodes in some situations:
232232

233233
```{eval-rst}
234234
.. note::
235-
These flags have been deprecated in favour of using
236-
:class:`NodeType<NodeType>`. This is not a breaking change. The
237-
constant associated with each flag remains the same. However, working
238-
with flags should be more intuitive now that we are relying on the
235+
These flags have been deprecated in favour of using
236+
:class:`NodeType<NodeType>`. This is not a breaking change. The
237+
constant associated with each flag remains the same. However, working
238+
with flags should be more intuitive now that we are relying on the
239239
:class:`python:enum.Flag` functionality.
240-
```
240+
```
241241

242242
```{data} msprime.NODE_IS_RE_EVENT
243243
@@ -261,7 +261,7 @@ adding common ancestor events to the ``additional_nodes`` to be stored.
261261
262262
The node is an ARG migration event identifying the individual that migrated.
263263
Can be used in combination with the ``record_migrations`` option.
264-
Present either with the ``record_full_arg`` flag or when adding migration
264+
Present either with the ``record_full_arg`` flag or when adding migration
265265
events to the ``additional_nodes`` to be stored.
266266
267267
```
@@ -282,9 +282,9 @@ The node was created by a gene conversion event.
282282
```{data} msprime.NODE_IS_PASS_THROUGH
283283
284284
The node identifies an ancestral genome/ploid through which the ancestral
285-
material of only a single lineage passed (so no coalescence or common
285+
material of only a single lineage passed (so no coalescence or common
286286
ancestor event). Can be used in combination with the ``record_migrations`` option.
287-
Only present when storing pass through events using the
287+
Only present when storing pass through events using the
288288
``additional_nodes`` option. And only compatible with ``DTWF``
289289
and ``FixedPedigree`` models.
290290

lib/msprime.c

Lines changed: 18 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -5189,7 +5189,6 @@ msp_run_coalescent(msp_t *self, double max_time, unsigned long max_events)
51895189
GSL_MIN(gc_t_wait, GSL_MIN(gc_left_t_wait, GSL_MIN(re_t_wait, ca_t_wait))));
51905190

51915191
if (fixed_event_time == DBL_MAX && t_wait == DBL_MAX) {
5192-
msp_print_state(self, stdout);
51935192
ret = MSP_ERR_INFINITE_WAITING_TIME;
51945193
goto out;
51955194
}
@@ -5405,7 +5404,7 @@ msp_run_pedigree(msp_t *self, double max_time, unsigned long max_events)
54055404
pedigree->next_individual++;
54065405
num_events++;
54075406
}
5408-
if (msp_get_num_ancestors(self) == 0) {
5407+
if (msp_is_completed(self) && self->stop_at_local_mrca) {
54095408
ret = MSP_EXIT_COALESCENCE;
54105409
} else if (pedigree->next_individual == num_individuals) {
54115410
ret = MSP_EXIT_MODEL_COMPLETE;
@@ -5534,7 +5533,6 @@ msp_dtwf_generation(msp_t *self)
55345533
}
55355534
}
55365535
}
5537-
55385536
free(parents);
55395537
free(segment_mem);
55405538
segment_mem = NULL;
@@ -5738,14 +5736,13 @@ msp_run_dtwf(msp_t *self, double max_time, unsigned long max_events)
57385736
goto out;
57395737
}
57405738
ret = msp_apply_demographic_events(self, self->next_demographic_event->time);
5741-
// Demographic events can change population structure
57425739
if (ret != 0) {
57435740
goto out;
57445741
}
57455742
}
57465743
self->time = cur_time;
57475744
ret = msp_dtwf_generation(self);
5748-
if (ret < 0) {
5745+
if (ret != 0) {
57495746
goto out;
57505747
}
57515748

@@ -6084,11 +6081,7 @@ msp_run(msp_t *self, double max_time, unsigned long max_events)
60846081
goto out;
60856082
}
60866083

6087-
if (msp_is_completed(self)) {
6088-
/* If the simulation is completed, run() is a no-op for
6089-
* all models. */
6090-
ret = 0;
6091-
} else if (self->model.type == MSP_MODEL_DTWF) {
6084+
if (self->model.type == MSP_MODEL_DTWF) {
60926085
ret = msp_run_dtwf(self, max_time, max_events);
60936086
} else if (self->model.type == MSP_MODEL_WF_PED) {
60946087
ret = msp_run_pedigree(self, max_time, max_events);
@@ -6515,36 +6508,27 @@ msp_get_time(msp_t *self)
65156508
int
65166509
msp_is_completed(msp_t *self)
65176510
{
6518-
/* int model = self->model.type; */
65196511
bool completed;
65206512
avl_node_t *node;
65216513
node_mapping_t *nm;
65226514

6523-
/* NOTE! This is a quick first implementation. This could be a performance
6524-
* bottleneck and maybe we keep track of whether there's any non-1 values
6525-
* instead
6526-
*/
6527-
completed = true;
6528-
for (node = self->overlap_counts.head; node->next != NULL; node = node->next) {
6529-
nm = (node_mapping_t *) node->item;
6530-
if (nm->value > 1) {
6531-
completed = false;
6532-
break;
6515+
if (self->stop_at_local_mrca) {
6516+
/* When we stop at the local MRCA we have a cheap way of testing for completion
6517+
*/
6518+
completed = msp_get_num_ancestors(self) == 0;
6519+
} else {
6520+
/* When we simulate ancestry after local roots, we have to look at the overlap
6521+
* counts to find when we've coalesced.
6522+
*/
6523+
completed = true;
6524+
for (node = self->overlap_counts.head; node->next != NULL; node = node->next) {
6525+
nm = (node_mapping_t *) node->item;
6526+
if (nm->value > 1) {
6527+
completed = false;
6528+
break;
6529+
}
65336530
}
65346531
}
6535-
6536-
/* } */
6537-
6538-
/* if (model == MSP_MODEL_HUDSON || model == MSP_MODEL_SMC_K || model ==
6539-
* MSP_MODEL_BETA */
6540-
/* || model == MSP_MODEL_DIRAC || model == MSP_MODEL_SMC */
6541-
/* || model == MSP_MODEL_SMC_PRIME */
6542-
/* || model == MSP_MODEL_DTWF) { */
6543-
/* completed = msp_coalescent_is_complete(self); */
6544-
/* } else { */
6545-
/* completed = msp_get_num_ancestors(self) == 0; */
6546-
/* } */
6547-
65486532
return self->state == MSP_STATE_SIMULATING && completed;
65496533
}
65506534

lib/tests/test_ancestry.c

Lines changed: 1 addition & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -817,22 +817,16 @@ test_single_locus_continue_after_local_mrca(void)
817817
tsk_treeseq_t ts;
818818
tsk_tree_t tree;
819819
tsk_id_t root;
820-
821820
gsl_rng *rng = safe_rng_alloc();
822821

823-
gsl_rng_set(rng, seed);
824-
825822
for (j = 0; j < sizeof(models) / sizeof(int); j++) {
826-
823+
gsl_rng_set(rng, seed);
827824
ret = build_sim(&msp, &tables, rng, m, 1, NULL, n);
828825
ret = msp_set_recombination_rate(&msp, 0);
829826
CU_ASSERT_EQUAL_FATAL(ret, 0);
830-
831827
ret = msp_set_stop_at_local_mrca(&msp, false);
832828
CU_ASSERT_EQUAL_FATAL(ret, 0);
833-
834829
set_simulation_model(&msp, models[j]);
835-
836830
ret = msp_initialise(&msp);
837831
CU_ASSERT_EQUAL(ret, 0);
838832
ret = msp_run(&msp, DBL_MAX, ULONG_MAX);
@@ -845,17 +839,14 @@ test_single_locus_continue_after_local_mrca(void)
845839
ret = tsk_tree_init(&tree, &ts, 0);
846840
CU_ASSERT_EQUAL_FATAL(ret, 0);
847841
CU_ASSERT_EQUAL_FATAL(tsk_treeseq_get_num_trees(&ts), 1);
848-
849842
ret = tsk_tree_first(&tree);
850843
CU_ASSERT_EQUAL_FATAL(ret, TSK_TREE_OK);
851844
CU_ASSERT_EQUAL_FATAL(tsk_tree_get_num_roots(&tree), 1);
852-
853845
root = tsk_tree_get_left_root(&tree);
854846
CU_ASSERT_EQUAL(msp.tables->nodes.time[root], msp_get_time(&msp));
855847

856848
model = msp_get_model(&msp)->type;
857849
CU_ASSERT_EQUAL(model, models[j]);
858-
859850
ret = msp_free(&msp);
860851
CU_ASSERT_EQUAL(ret, 0);
861852
tsk_table_collection_free(&tables);

lib/tests/test_pedigrees.c

Lines changed: 17 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -207,7 +207,7 @@ verify_pedigree(double recombination_rate, unsigned long seed,
207207
static void
208208
verify_pedigree_event_by_event(double recombination_rate, unsigned long seed,
209209
tsk_size_t num_individuals, tsk_id_t *parents, double *time, tsk_flags_t *is_sample,
210-
tsk_id_t *population, uint32_t additional_nodes)
210+
tsk_id_t *population, uint32_t additional_nodes, bool stop_at_local_mrca)
211211
{
212212
int ret, status1, status2;
213213
int ploidy = 2;
@@ -223,6 +223,9 @@ verify_pedigree_event_by_event(double recombination_rate, unsigned long seed,
223223
parents, time, is_sample, population);
224224
CU_ASSERT_EQUAL_FATAL(ret, 0);
225225

226+
msp_set_stop_at_local_mrca(&msp1, stop_at_local_mrca);
227+
msp_set_stop_at_local_mrca(&msp2, stop_at_local_mrca);
228+
226229
ret = msp_initialise(&msp1);
227230
CU_ASSERT_EQUAL_FATAL(ret, 0);
228231
ret = msp_initialise(&msp2);
@@ -899,8 +902,9 @@ test_internal_samples(void)
899902
verify_pedigree(0, 1, 3, parents, time, is_sample, NULL, 0);
900903
verify_pedigree(0.1, 1, 3, parents, time, is_sample, NULL, 0);
901904

902-
verify_pedigree_event_by_event(0, 1, 3, parents, time, is_sample, NULL, 0);
903-
verify_pedigree_event_by_event(0.1, 1, 3, parents, time, is_sample, NULL, 0);
905+
verify_pedigree_event_by_event(0, 1, 3, parents, time, is_sample, NULL, 0, true);
906+
verify_pedigree_event_by_event(0.1, 1, 3, parents, time, is_sample, NULL, 0, true);
907+
verify_pedigree_event_by_event(0.1, 1, 3, parents, time, is_sample, NULL, 0, false);
904908
}
905909

906910
static void
@@ -919,8 +923,12 @@ test_no_leaf_samples(void)
919923
verify_pedigree(0, 1, num_inds, parents, time, is_sample, NULL, 0);
920924
verify_pedigree(0.1, 1, num_inds, parents, time, is_sample, NULL, 0);
921925

922-
verify_pedigree_event_by_event(0, 1, num_inds, parents, time, is_sample, NULL, 0);
923-
verify_pedigree_event_by_event(0.1, 1, num_inds, parents, time, is_sample, NULL, 0);
926+
verify_pedigree_event_by_event(
927+
0, 1, num_inds, parents, time, is_sample, NULL, 0, true);
928+
verify_pedigree_event_by_event(
929+
0, 1, num_inds, parents, time, is_sample, NULL, 0, false);
930+
verify_pedigree_event_by_event(
931+
0.1, 1, num_inds, parents, time, is_sample, NULL, 0, true);
924932
}
925933

926934
static void
@@ -966,7 +974,8 @@ test_trio_same_pop(void)
966974
tsk_id_t population[] = { 2, 2, 2 };
967975

968976
verify_pedigree(0, 1, 3, parents, time, NULL, population, 0);
969-
verify_pedigree_event_by_event(0, 1, 3, parents, time, NULL, population, 0);
977+
verify_pedigree_event_by_event(0, 1, 3, parents, time, NULL, population, 0, true);
978+
verify_pedigree_event_by_event(0, 1, 3, parents, time, NULL, population, 0, false);
970979
}
971980

972981
static void
@@ -978,7 +987,8 @@ test_trio_child_different_pop(void)
978987
tsk_id_t population[] = { 2, 2, 1 };
979988

980989
verify_pedigree(0, 1, 3, parents, time, NULL, population, 0);
981-
verify_pedigree_event_by_event(0, 1, 3, parents, time, NULL, population, 0);
990+
verify_pedigree_event_by_event(0, 1, 3, parents, time, NULL, population, 0, true);
991+
verify_pedigree_event_by_event(0, 1, 3, parents, time, NULL, population, 0, false);
982992
}
983993

984994
int

0 commit comments

Comments
 (0)