surface_postprocess_trie

surface_postprocess_trie(df: ~polars.dataframe.frame.DataFrame, *, drop_dstream_metadata: bool | None = None, trie_postprocessor: ~typing.Callable = <hstrat.phylogenetic_inference.tree.trie_postprocess._NopTriePostprocessor.NopTriePostprocessor object>, delete_trunk: bool = True) DataFrame

Postprocess raw phylogenetic tree reconstruction output data to create finalized estimate of phylogenetic history.

Perfoms the following operations: - Delete trunk nodes with rank less than dstream_S. - Collapse unifurcations. - Assign contiguous IDs to nodes. - Apply supplied trie_postprocessor functor.

Parameters

dfpl.DataFrame

The input DataFrame containing packed data with required columns, one row per genome.

Required schema:
  • ‘id’pl.UInt64
    • Unique identifier for each taxon (RE alife standard format).

  • ‘ancestor_id’pl.UInt64
    • Unique identifier for ancestor taxon (RE alife standard format).

  • ‘dstream_rank’pl.UInt64
    • Num generations elapsed for ancestral differentia.

    • Corresponds to dstream_Tbar for inner nodes.

    • Corresponds to dstream_T - 1 for leaf nodes.

  • ‘hstrat_differentia_bitwidth’pl.UInt32
    • Size of annotation differentiae, in bits.

    • Corresponds to dstream_value_bitwidth.

  • ‘dstream_S’pl.UInt32
    • Capacity of dstream buffer used for hstrat surface, in number of data items (i.e., differentia values).

Optional schema:
  • ‘dstream_data_id’pl.UInt64
    • Unique identifier for each genome in source genomedataframe

delete_trunkbool, default True

Should trunk nodes with rank less than dstream_S be deleted?

Trunk deletion accounts for “dummy” strata added to fill hstrat surface for founding ancestor(s), by segregating subtrees with distinct founding strata into independent trees.

trie_postprocessorCallable, default hstrat.NopTriePostprocessor()

Tree postprocess functor.

Must take trie of type pandas.DataFrame, p_differentia_collision of type float, mutate of type bool, and progress_wrap of type Callable params. Must return postprocessed trie (type pl.DataFrame).

To apply multiple postprocessors, use hstrat.CompoundTriePostprocessor.

Returns

pl.DataFrame

The output DataFrame containing the estimated phylogenetic tree in alife standard format, with the following columns:

Required schema: - ‘id’ : pl.UInt64

  • Unique identifier for each taxon (RE alife standard format).

  • ‘ancestor_id’pl.UInt64
    • Unique identifier for ancestor taxon (RE alife standard format).

  • ‘hstrat_rank’pl.Int64
    • Num generations elapsed for ancestral differentia.

    • Corresponds to dstream_Tbar - dstream_S for inner nodes.

    • Corresponds to dstream_T - 1 - dstream_S for leaf nodes.

Optional schema: - ‘origin_time’ : pl.Int64

  • Estimated origin time for phylogeny nodes, in generations elapsed since founding ancestor.

    Value depends on the trie postprocessor used.

Additional user-defined columns will be forwarded from the input DataFrame. Any columns created by the trie postprocessor will also be included.

Note that the alife-standard ancestor_list column is not included in the output.

Notes

Collapsing trunk nodes with rank less than dstream_S assumes that S “dummy” strata were added to fill hstrat surface for founding ancestor(s).

Currently, data is converted to Pandas for processing, then back to Polars.

See Also

surface_unpack_reconstruct :

Creates raw reconstruction data postprocessed here.

alifestd_try_add_ancestor_list_col :

Adds alife-standard ancestor_list column to phylogeny data.