surface_postprocess_trie
- surface_postprocess_trie(df: ~polars.dataframe.frame.DataFrame, *, drop_dstream_metadata: bool | None = None, trie_postprocessor: ~typing.Callable = <hstrat.phylogenetic_inference.tree.trie_postprocess._NopTriePostprocessor.NopTriePostprocessor object>, delete_trunk: bool = True) DataFrame
Postprocess raw phylogenetic tree reconstruction output data to create finalized estimate of phylogenetic history.
Perfoms the following operations: - Delete trunk nodes with rank less than dstream_S. - Collapse unifurcations. - Assign contiguous IDs to nodes. - Apply supplied trie_postprocessor functor.
Parameters
- dfpl.DataFrame
The input DataFrame containing packed data with required columns, one row per genome.
- Required schema:
- ‘id’pl.UInt64
Unique identifier for each taxon (RE alife standard format).
- ‘ancestor_id’pl.UInt64
Unique identifier for ancestor taxon (RE alife standard format).
- ‘dstream_rank’pl.UInt64
Num generations elapsed for ancestral differentia.
Corresponds to dstream_Tbar for inner nodes.
Corresponds to dstream_T - 1 for leaf nodes.
- ‘hstrat_differentia_bitwidth’pl.UInt32
Size of annotation differentiae, in bits.
Corresponds to dstream_value_bitwidth.
- ‘dstream_S’pl.UInt32
Capacity of dstream buffer used for hstrat surface, in number of data items (i.e., differentia values).
- Optional schema:
- ‘dstream_data_id’pl.UInt64
Unique identifier for each genome in source genomedataframe
- delete_trunkbool, default True
Should trunk nodes with rank less than dstream_S be deleted?
Trunk deletion accounts for “dummy” strata added to fill hstrat surface for founding ancestor(s), by segregating subtrees with distinct founding strata into independent trees.
- trie_postprocessorCallable, default hstrat.NopTriePostprocessor()
Tree postprocess functor.
Must take trie of type pandas.DataFrame, p_differentia_collision of type float, mutate of type bool, and progress_wrap of type Callable params. Must return postprocessed trie (type pl.DataFrame).
To apply multiple postprocessors, use hstrat.CompoundTriePostprocessor.
Returns
- pl.DataFrame
The output DataFrame containing the estimated phylogenetic tree in alife standard format, with the following columns:
Required schema: - ‘id’ : pl.UInt64
Unique identifier for each taxon (RE alife standard format).
- ‘ancestor_id’pl.UInt64
Unique identifier for ancestor taxon (RE alife standard format).
- ‘hstrat_rank’pl.Int64
Num generations elapsed for ancestral differentia.
Corresponds to dstream_Tbar - dstream_S for inner nodes.
Corresponds to dstream_T - 1 - dstream_S for leaf nodes.
Optional schema: - ‘origin_time’ : pl.Int64
Estimated origin time for phylogeny nodes, in generations elapsed since founding ancestor.
Value depends on the trie postprocessor used.
Additional user-defined columns will be forwarded from the input DataFrame. Any columns created by the trie postprocessor will also be included.
Note that the alife-standard ancestor_list column is not included in the output.
Notes
Collapsing trunk nodes with rank less than dstream_S assumes that S “dummy” strata were added to fill hstrat surface for founding ancestor(s).
Currently, data is converted to Pandas for processing, then back to Polars.
See Also
- surface_unpack_reconstruct :
Creates raw reconstruction data postprocessed here.
- alifestd_try_add_ancestor_list_col :
Adds alife-standard ancestor_list column to phylogeny data.