surface_postprocess_trie

surface_postprocess_trie(df: ~polars.dataframe.frame.DataFrame, *, drop_dstream_metadata: bool | None = None, trie_postprocessor: ~typing.Callable = <hstrat.phylogenetic_inference.tree.trie_postprocess._NopTriePostprocessor.NopTriePostprocessor object>, delete_trunk: bool = True) → DataFrame

Postprocess raw phylogenetic tree reconstruction output data to create finalized estimate of phylogenetic history.

Perfoms the following operations: - Delete trunk nodes with rank less than dstream_S. - Collapse unifurcations. - Assign contiguous IDs to nodes. - Apply supplied trie_postprocessor functor.

Parameters

dfpl.DataFrame

The input DataFrame containing packed data with required columns, one row per genome.

Required schema:

‘id’pl.UInt64
- Unique identifier for each taxon (RE alife standard format).
‘ancestor_id’pl.UInt64
- Unique identifier for ancestor taxon (RE alife standard format).
‘dstream_rank’pl.UInt64
- Num generations elapsed for ancestral differentia.
- Corresponds to dstream_Tbar for inner nodes.
- Corresponds to dstream_T - 1 for leaf nodes.
‘hstrat_differentia_bitwidth’pl.UInt32
- Size of annotation differentiae, in bits.
- Corresponds to dstream_value_bitwidth.
‘dstream_S’pl.UInt32
- Capacity of dstream buffer used for hstrat surface, in number of data items (i.e., differentia values).

Optional schema:

‘dstream_data_id’pl.UInt64
- Unique identifier for each genome in source genomedataframe

delete_trunkbool, default True

Should trunk nodes with rank less than dstream_S be deleted?

Trunk deletion accounts for “dummy” strata added to fill hstrat surface for founding ancestor(s), by segregating subtrees with distinct founding strata into independent trees.

trie_postprocessorCallable, default hstrat.NopTriePostprocessor()

Tree postprocess functor.

Must take trie of type pandas.DataFrame, p_differentia_collision of type float, mutate of type bool, and progress_wrap of type Callable params. Must return postprocessed trie (type pl.DataFrame).

To apply multiple postprocessors, use hstrat.CompoundTriePostprocessor.

Returns

pl.DataFrame

The output DataFrame containing the estimated phylogenetic tree in alife standard format, with the following columns:

Required schema: - ‘id’ : pl.UInt64

Unique identifier for each taxon (RE alife standard format).

‘ancestor_id’pl.UInt64
- Unique identifier for ancestor taxon (RE alife standard format).
‘hstrat_rank’pl.Int64
- Num generations elapsed for ancestral differentia.
- Corresponds to dstream_Tbar - dstream_S for inner nodes.
- Corresponds to dstream_T - 1 - dstream_S for leaf nodes.

Optional schema: - ‘origin_time’ : pl.Int64

Estimated origin time for phylogeny nodes, in generations elapsed since founding ancestor.

Value depends on the trie postprocessor used.

Additional user-defined columns will be forwarded from the input DataFrame. Any columns created by the trie postprocessor will also be included.

Note that the alife-standard ancestor_list column is not included in the output.

Notes

Collapsing trunk nodes with rank less than dstream_S assumes that S “dummy” strata were added to fill hstrat surface for founding ancestor(s).

Currently, data is converted to Pandas for processing, then back to Polars.

surface_postprocess_trie

Parameters

Returns

Notes

See Also