build_tree_trie

build_tree_trie(population: ~typing.Sequence[~typing.Union[HereditaryStratigraphicColumn, HereditaryStratigraphicSpecimen]], taxon_labels: ~typing.Iterable | None = None, force_common_ancestry: bool = False, progress_wrap: ~typing.Callable = <function <lambda>>, seed: int | None = 1, bias_adjustment: ~typing.Literal['sample_ancestral_rollbacks'] | ~hstrat.phylogenetic_inference.priors._detail._PriorBase.PriorBase | ~hstrat.phylogenetic_inference.tree.trie_postprocess._detail._TriePostprocessorBase.TriePostprocessorBase | None = None) DataFrame

Estimate the phylogenetic history among hereditary stratigraphic columns by building a trie (a.k.a. prefix tree) of the differentia sequences of hereditary stratigraphic artifacts within a population.

Exhibits time complexity at most O(nlog(n)) for population size n.

Parameters

population: Sequence[HereditaryStratigraphicArtifact]

Hereditary stratigraphic columns corresponding to extant population members.

Each member of population will correspond to a unique leaf node in the reconstructed tree.

taxon_labels: Optional[Iterable], optional

How should leaf nodes representing extant hereditary stratigraphic columns be named?

Label order should correspond to the order of corresponding hereditary stratigraphic columns within population. If None, taxons will be named according to their numerical index.

force_common_ancestry: bool, default False

How should columns that definitively share no common ancestry be handled?

If set to True, treat columns with no common ancestry as if they shared a common ancestor immediately before the genesis of the lineages. If set to False, columns within population that definitively do not share common ancestry will raise a ValueError.

progress_wrapCallable, default identity function

Pass tqdm or equivalent to display progress bars.

seedint, default 1

Controls tiebreaking decisions in the algorithm.

Pass an int for reproducible output across multiple function calls. The default value, 1, ensures reproducible output. Pass None to use global RNG context.

bias_adjustment : “sample_ancestral_rollbacks”, PriorBase, or TriePostProcessorBase, optional

How should bias toward overestimation of relatedness due to differentia collisions be corrected for?

If “sample_ancestral_rollbacks”, the trie topology will be adjusted as if the expected number of collisions had occurred. Targets for “unzipping” to reverse the effect of a speculated collision are chosen randomly from within the tree. See SampleAncestralRollbacksTriePostprocessor for details.

If a prior functor is passed, the origin time for each trie node will be calculated as the expected origin time over the distribution of possible differentia collisions. Correction recursively takes into account the possibility of multiple collisions. See hstrat.phylogenetic_inference.priors for available prior distributions. A custom prior distribution may also be supplied. See AssignOriginTimeExpectedValueTriePostprocessor for details.

If a prior functor is passed, correction for guaranteed-spurious collision between most-recent strata will also be performed. See PeelBackConjoinedLeavesTriePostprocessor for details.

If None, no correction will be performed. The origin time for each trie node will be assigned using a naive strategy, calculated as the average of the node’s rank and the minimum rank among its children. See AssignOriginTimeNaiveTriePostprocessor for details.

Returns

pd.DataFrame

The reconstructed phylogenetic tree in alife standard format.

Notes

Unifurcations in the reconstructed tree are collapsed.

However, polytomies are not resolved. In addition to any true polytomies, ancestry sequences that cannot be resolved due to missing information appear as polytomies in the generated reconstruction. Therefore, polytomies are generally overrepresented in reconstructions, especially when low hereditary stratigraphic resolution is available. If overestimation of polytomies is problematic, external tools can be used to decompose polytomies into arbitrarily-arranged bifurcations.