@@ -20,8 +20,8 @@ kernelspec:
2020
2121At its heart, a ` tskit ` {ref}` tree sequence<sec_what_is> ` consists of a list of
2222{ref}` sec_terminology_nodes ` , and a list of {ref}` sec_terminology_edges ` that connect
23- those nodes. Therefore a succinct tree sequence is equivalent to a
24- [ mathematical graph] ( https://en.wikipedia.org/wiki/Graph_(discrete_mathematics) ) ,
23+ parent to child nodes. Therefore a succinct tree sequence is equivalent to a
24+ [ directed graph] ( https://en.wikipedia.org/wiki/Directed_graph ) ,
2525which is additionally annotated with genomic positions such that at each
2626position, a path through the edges exists which defines a tree. This graph
2727interpretation of a tree sequence is tightly connected to the concept of
@@ -147,14 +147,17 @@ ts_arg.draw_svg(
147147)
148148```
149149
150- The number of children a node has in a local tree can be termed the
151- "local arity" of a node. It is clear from the plot above that both red and blue nodes
152- can have a local arity of one. The act of ` simplification ` can
153- transform a tree sequence so that all nodes have a local arity of
154- 2 or more, which is [ more efficient] ( sec_args_disadvantages ) .
155- However, this loses information about the timings
156- and topological operations associated with recombination
157- events and some common ancestor events. This information is useful for
150+ The number of children descending from a node in a local tree can be termed the
151+ "local arity" of that node. It is clear from the plot above that red nodes always
152+ have a local arity of 1, and blue nodes sometimes do. This may seem an unusual
153+ state of affairs: tree representations often focus on branch-points, and ignore nodes
154+ with a single child. Indeed, it is possible to [ simplify] ( sec_args_simplification ) the
155+ ARG above, resulting in a graph whose local trees only contain branch points or tips
156+ (i.e. local arity is never 1). Such a graph is [ more compact] ( sec_args_disadvantages )
157+ than the full ARG, but it omits some information about the timings and
158+ topological operations associated with recombination
159+ events and some common ancestor events. This information, as captured by the local
160+ unary nodes, is useful for
158161
1591621 . Retaining precise information about the time and lineages involved in recombination.
160163 This is required e.g. to ensure we can always work out the tree editing (or
@@ -214,6 +217,8 @@ represented, in which both parents at a recombination event trace directly back
214217same common ancestor.
215218:::
216219
220+ (sec_args_simplification)=
221+
217222## Simplification
218223
219224If we fully {ref}` simplify<sec_simplification> ` the tree above, all remaining nodes
@@ -302,13 +307,15 @@ structures for simulation or inference is therefore infeasible.
302307
303308## ARG formats and ` tskit `
304309
305- In classical ARGs, nodes often represent events (specifically, _ common ancestor_ ,
306- _ recombination_ , and _ sampling_ events), with the genomic regions of inheritance
307- encoded by storing a specific breakpoint location on each recombination node.
308- In contrast, nodes in a ` tskit ` ARG correspond to _ genomes_ , and inherited regions
309- are defined by intervals stored on * edges* (via the {attr}` ~Edge.left ` and
310- {attr}` ~Edge.right ` properties), rather than on nodes. Here, for example, is the
311- edge table from our ARG:
310+ It is worth noting a subtle and somewhat philosophical
311+ difference between some classical ARG formulations, and the ARG formulation
312+ used in ` tskit ` . Classically, nodes in an ARG are taken to represent _ events_
313+ (specifically, "common ancestor", "recombination", and "sampling" events),
314+ and genomic regions of inheritance are encoded by storing a specific breakpoint location on
315+ each recombination node. In contrast, [ nodes] ( tskit:sec_data_model_definitions_node ) in a ` tskit `
316+ ARG correspond to _ genomes_ . More crucially, inherited regions are defined by intervals
317+ stored on * edges* (via the {attr}` ~Edge.left ` and {attr}` ~Edge.right ` properties),
318+ rather than on nodes. Here, for example, is the edge table from our ARG:
312319
313320``` {code-cell}
314321ts_arg.tables.edges
@@ -325,7 +332,7 @@ simplification possible, and means `tskit` can encode ancestry without having
325332to pin down exactly when specific ancestral events took place.
326333
327334
328- ## Working with the tree sequence graph
335+ ## Working with ARGs in ` tskit `
329336
330337All tree sequences, including, but not limited to full ARGs, can be treated as
331338directed (acyclic) graphs. Although many tree sequence operations operate from left to
0 commit comments