Cumulative edge area (i.e. mutational area) over time #3122
hyanwong
started this conversation in
Show and tell
Replies: 2 comments 5 replies
-
Nice; this seems like a good diagnostic plot. Just for fun, here's a version that avoids the
|
Beta Was this translation helpful? Give feedback.
4 replies
-
A tree-based version can be obtained using the tree.edge_array indexes (although this is not incremental): def cumulative_branch_length(tree):
"""
Calculate the cumulative branch length in the tree going from oldest to youngest
(unique) node times in the tree. This is equivalent to the
area available for mutations. The last value in the returned `cumulative_lengths`
vector is the total branch length of this tree.
Note that if there are any mutations above local roots, these do
not occur on edges, and hence are not relevant to this function.
Parameters:
tree: tskit.Tree object
Returns:
tuple: (times, cumulative_lengths)
times: array of unique node times where branches start or end, sorted descending
cumulative_lengths: cumulative branch lengths from an indefinitely long time ago to
each timepoint in the `times` array
"""
used_ids = tree.edge_array[tree.edge_array != tskit.NULL]
starts = -ts.nodes_time[ts.edges_parent[used_ids]]
ends = -ts.nodes_time[ts.edges_child[used_ids]]
# Create event arrays: each event has a position and a num lineage change
# Start events add lineages, end events subtract lineages
event_times = np.concatenate([starts, ends])
event_lineage_count = np.concatenate([np.ones(len(used_ids)), -np.ones(len(used_ids))])
times = np.unique(event_times)
dt = np.diff(times)
assert np.all(dt) > 0
event_ind = np.searchsorted(times, event_times)
cumulative_areas = np.cumsum(dt * np.cumsum(np.bincount(event_ind, weights=event_lineage_count))[:-1])
return -times, np.concatenate([[0.0], cumulative_areas]) |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I wanted to calculate the mutational area over time of a large tree sequence, and compare that with the actual observed accumulation of mutations. It wasn't obvious the me how to get the cumulative area of the edges as a function of time, but a bit of AI querying got me the following fast implementation, so I'm sticking it here for posterity:
Test to check that the eCDF of simulated mutation times agrees with the cumulative area
Beta Was this translation helpful? Give feedback.
All reactions