tree

Functions to reconstruct a phylogenetic tree from extant hereditary strata.

Modules

trie_postprocess

Implementation helpers.

Functions

build_tree(population, version_pin[, ...])

Estimate the phylogenetic history among hereditary stratigraphic columns.

build_tree_nj(population, estimator, prior)

Estimate the phylogenetic history among hereditary stratigraphic columns by using the "unweighted pair group method with neighbor joinint distance-based reconstruction method.

build_tree_trie(population[, taxon_labels, ...])

Estimate the phylogenetic history among hereditary stratigraphic columns by building a trie (a.k.a.

build_tree_trie_ensemble(population, ...[, ...])

Estimate the phylogenetic history among hereditary stratigraphic columns by building a trie (a.k.a.

build_tree_upgma(population, estimator, prior)

Estimate the phylogenetic history among hereditary stratigraphic columns by using the "unweighted pair group method with unweighted means" (UPGMA) distance-based reconstruction method.

Classes

AssignDestructionTimeYoungestPlusOneTriePostprocessor

Functor to assign a destruction time property to trie nodes.

AssignOriginTimeExpectedValueTriePostprocessor

Functor to assign origin time property to trie nodes using expected values over the distribution of possible differentia collisions.

AssignOriginTimeNaiveTriePostprocessor

Functor to assign origin time property to trie nodes calculated as the average of the node's rank and the minimum rank among its children.

AssignOriginTimeNodeRankTriePostprocessor

Functor to assign trie nodes' rank as their the origin time.

AssignOriginTimeSampleNaiveTriePostprocessor

Functor to assign origin time property to trie nodes sampled between the node's rank and the minimum rank among its children.

CompoundTriePostprocessor

Functor to sequentially apply multiple trie postprocessors.

PeelBackConjoinedLeavesTriePostprocessor

Functor to separate any TrieLeafNode instances that are direct siblings.

SampleAncestralRollbacksTriePostprocessor

Functor to correct for systematic overestimation of relatedness by sampling a compensatory adjustment to trie topology.

build_tree_nj(population: Sequence[Union[HereditaryStratigraphicColumn, HereditaryStratigraphicSpecimen]], estimator: str, prior: str | Any, taxon_labels: Iterable | None = None, force_common_ancestry: bool = False, negative_origin_time_correction_method: str | None = None) DataFrame

Estimate the phylogenetic history among hereditary stratigraphic columns by using the “unweighted pair group method with neighbor joinint distance-based reconstruction method.

This phylogenetic reconstruction approach is generally unfavorable, incuring O(n^2) runtime complexity and providing reconstructions that occasionally conflict with the hereditary stratigraphic record.

Parameters

population: Sequence[HereditaryStratigraphicArtifact]

Hereditary stratigraphic columns corresponding to extant population members.

Each member of population will correspond to a unique leaf node in the reconstructed tree.

estimator{“maximum_likelihood”, “unbiased”}

What estimation method should be used? Options are “maximum_likelihood” or “unbiased”.

See estimate_ranks_since_mrca_with for discussion of estimator options.

prior{“arbitrary”, “uniform”} or object implementing prior interface

Prior probability density distribution over possible generations of the MRCA.

See estimate_rank_of_mrca_between for discussion of prior options.

taxon_labels: Optional[Iterable], optional

How should leaf nodes representing extant hereditary stratigraphic columns be named?

Label order should correspond to the order of corresponding hereditary stratigraphic columns within population. If None, taxons will be named according to their numerical index.

force_common_ancestry: bool, default False

How should columns that definitively share no common ancestry be handled?

If set to True, treat columns with no common ancestry as if they shared a common ancestor immediately before the genesis of the lineages. If set to False, columns within population that definitively do not share common ancestry will raise a ValueError.

negative_origin_time_correction_method

: {“truncate”, “shift”, “rescale”}, optional How should negative origin time estimates be corrected?

Returns

pd.DataFrame

The reconstructed phylogenetic tree in alife standard format.

build_tree_trie(population: ~typing.Sequence[~typing.Union[HereditaryStratigraphicColumn, HereditaryStratigraphicSpecimen]], taxon_labels: ~typing.Iterable | None = None, force_common_ancestry: bool = False, progress_wrap: ~typing.Callable = <function <lambda>>, seed: int | None = 1, bias_adjustment: ~typing.Literal['sample_ancestral_rollbacks'] | ~hstrat.phylogenetic_inference.priors._detail._PriorBase.PriorBase | ~hstrat.phylogenetic_inference.tree.trie_postprocess._detail._TriePostprocessorBase.TriePostprocessorBase | None = None) DataFrame

Estimate the phylogenetic history among hereditary stratigraphic columns by building a trie (a.k.a. prefix tree) of the differentia sequences of hereditary stratigraphic artifacts within a population.

Exhibits time complexity at most O(nlog(n)) for population size n.

Parameters

population: Sequence[HereditaryStratigraphicArtifact]

Hereditary stratigraphic columns corresponding to extant population members.

Each member of population will correspond to a unique leaf node in the reconstructed tree.

taxon_labels: Optional[Iterable], optional

How should leaf nodes representing extant hereditary stratigraphic columns be named?

Label order should correspond to the order of corresponding hereditary stratigraphic columns within population. If None, taxons will be named according to their numerical index.

force_common_ancestry: bool, default False

How should columns that definitively share no common ancestry be handled?

If set to True, treat columns with no common ancestry as if they shared a common ancestor immediately before the genesis of the lineages. If set to False, columns within population that definitively do not share common ancestry will raise a ValueError.

progress_wrapCallable, default identity function

Pass tqdm or equivalent to display progress bars.

seedint, default 1

Controls tiebreaking decisions in the algorithm.

Pass an int for reproducible output across multiple function calls. The default value, 1, ensures reproducible output. Pass None to use global RNG context.

bias_adjustment : “sample_ancestral_rollbacks”, PriorBase, or TriePostProcessorBase, optional

How should bias toward overestimation of relatedness due to differentia collisions be corrected for?

If “sample_ancestral_rollbacks”, the trie topology will be adjusted as if the expected number of collisions had occurred. Targets for “unzipping” to reverse the effect of a speculated collision are chosen randomly from within the tree. See SampleAncestralRollbacksTriePostprocessor for details.

If a prior functor is passed, the origin time for each trie node will be calculated as the expected origin time over the distribution of possible differentia collisions. Correction recursively takes into account the possibility of multiple collisions. See hstrat.phylogenetic_inference.priors for available prior distributions. A custom prior distribution may also be supplied. See AssignOriginTimeExpectedValueTriePostprocessor for details.

If a prior functor is passed, correction for guaranteed-spurious collision between most-recent strata will also be performed. See PeelBackConjoinedLeavesTriePostprocessor for details.

If None, no correction will be performed. The origin time for each trie node will be assigned using a naive strategy, calculated as the average of the node’s rank and the minimum rank among its children. See AssignOriginTimeNaiveTriePostprocessor for details.

Returns

pd.DataFrame

The reconstructed phylogenetic tree in alife standard format.

Notes

Unifurcations in the reconstructed tree are collapsed.

However, polytomies are not resolved. In addition to any true polytomies, ancestry sequences that cannot be resolved due to missing information appear as polytomies in the generated reconstruction. Therefore, polytomies are generally overrepresented in reconstructions, especially when low hereditary stratigraphic resolution is available. If overestimation of polytomies is problematic, external tools can be used to decompose polytomies into arbitrarily-arranged bifurcations.

build_tree_trie_ensemble(population: ~typing.Sequence[~typing.Union[HereditaryStratigraphicColumn, HereditaryStratigraphicSpecimen]], trie_postprocessors: ~typing.Sequence[~typing.Callable], taxon_labels: ~typing.Iterable | None = None, force_common_ancestry: bool = False, progress_wrap: ~typing.Callable = <function <lambda>>, seed: int | None = 1) List[DataFrame]

Estimate the phylogenetic history among hereditary stratigraphic columns by building a trie (a.k.a. prefix tree) of the differentia sequences of hereditary stratigraphic artifacts within a population.

Returns phylogeny reconstruction outcomes from alternate postprocessing schemes applied between trie contruction and conversion to an alife standard data frame, including ancestor taxon origin time estimation. Because the underlying pre-postprocess trie is only constructed once, this method allows for efficient comparison of potprocessing schemes.

Unless comparing alternate postprocessing schemes or applying a custom postprocessing option, end users should likely prefer build_tree_trie. This interface applies a single postprocess, set to a generally-appropriate default with a few other curated postprocesses specifiable by optional argument.

Parameters

population: Sequence[HereditaryStratigraphicArtifact]

Hereditary stratigraphic columns corresponding to extant population members.

Each member of population will correspond to a unique leaf node in the reconstructed tree.

trie_postprocessors: Iterable[Callable]

Tree postprocess functors.

Must take trie of type TrieInnerNode, p_differentia_collision of type float, mutate of type bool, and progress_wrap of type Callable params. Must returned postprocessed trie (type TrieInnerNode). Several

Each postprocess will be called indpendently to produce a returned postprocessed variant. Use CompoundTriePostprocessor to chain postprocess functors that should be applied in succession.

taxon_labels: Optional[Iterable], optional

How should leaf nodes representing extant hereditary stratigraphic columns be named?

Label order should correspond to the order of corresponding hereditary stratigraphic columns within population. If None, taxons will be named according to their numerical index.

force_common_ancestry: bool, default False

How should columns that definitively share no common ancestry be handled?

If set to True, treat columns with no common ancestry as if they shared a common ancestor immediately before the genesis of the lineages. If set to False, columns within population that definitively do not share common ancestry will raise a ValueError.

progress_wrapCallable, default identity function

Wrapper applied around generation iterator and row generator for final phylogeny compilation process.

Pass tqdm or equivalent to display progress bars.

seed: int, default 1

Controls tiebreaking decisions in the algorithm.

Pass an int for reproducible output across multiple function calls. The default value, 1, ensures reproducible output. Pass None to use existing RNG context directly.

Returns

typing.List[pd.DataFrame]

Reconstructed phylogenetic trees with each postprocessor applied, in alife standard format.

build_tree_upgma(population: Sequence[Union[HereditaryStratigraphicColumn, HereditaryStratigraphicSpecimen]], estimator: str, prior: str | Any, taxon_labels: Iterable | None = None, force_common_ancestry: bool = False, negative_origin_time_correction_method: str | None = None) DataFrame

Estimate the phylogenetic history among hereditary stratigraphic columns by using the “unweighted pair group method with unweighted means” (UPGMA) distance-based reconstruction method.

This phylogenetic reconstruction approach is generally unfavorable, incuring O(n^2) runtime complexity and providing reconstructions that occasionally conflict with the hereditary stratigraphic record.

Parameters

population: Sequence[HereditaryStratigraphicArtifact]

Hereditary stratigraphic columns corresponding to extant population members.

Each member of population will correspond to a unique leaf node in the reconstructed tree.

estimator{“maximum_likelihood”, “unbiased”}

What estimation method should be used? Options are “maximum_likelihood” or “unbiased”.

See estimate_ranks_since_mrca_with for discussion of estimator options.

prior{“arbitrary”, “uniform”} or object implementing prior interface

Prior probability density distribution over possible generations of the MRCA.

See estimate_rank_of_mrca_between for discussion of prior options.

taxon_labels: Optional[Iterable], optional

How should leaf nodes representing extant hereditary stratigraphic columns be named?

Label order should correspond to the order of corresponding hereditary stratigraphic columns within population. If None, taxons will be named according to their numerical index.

force_common_ancestry: bool, default False

How should columns that definitively share no common ancestry be handled?

If set to True, treat columns with no common ancestry as if they shared a common ancestor immediately before the genesis of the lineages. If set to False, columns within population that definitively do not share common ancestry will raise a ValueError.

negative_origin_time_correction_method

: {“truncate”, “shift”, “rescale”}, optional How should negative origin time estimates be corrected?

Returns

pd.DataFrame

The reconstructed phylogenetic tree in alife standard format.

build_tree(population: ~typing.Sequence[~typing.Union[HereditaryStratigraphicColumn, HereditaryStratigraphicSpecimen]], version_pin: str, taxon_labels: ~typing.Iterable | None = None, force_common_ancestry: bool = False, progress_wrap: ~typing.Callable = <function <lambda>>) DataFrame

Estimate the phylogenetic history among hereditary stratigraphic columns.

This function provides a generic interface that directs to an underlying implementation, chosen for flexibility and robustness across reconstruction scenarios. The underlying implementation may be revised in future releases of the software.

The current implementation delegates to build_tree_trie.

Parameters

population: Sequence[HereditaryStratigraphicArtifact]

Hereditary stratigraphic columns corresponding to extant population members.

Each member of population will correspond to a unique leaf node in the reconstructed tree.

version_pin: str

How should calls to this function resolve in future hstrat releases?

To prevent future releases from silently substituting the underlying reconstruction algorithm, hardcode the current version of hstrat here. To automatically track any future library updates, pass hstrat.__version__ here.

Some effort will be made to maintain historical implementations to support prior version pins. However, indefinite support is not guaranteed for hard version pins; old version pins may eventually raise DeprecationWarning or ValueError. Where reasonable, consider directly calling an implementing tree building method instead.

taxon_labels: Optional[Iterable], optional

How should leaf nodes representing extant hereditary stratigraphic columns be named?

Label order should correspond to the order of corresponding hereditary stratigraphic columns within population. If None, taxons will be named according to their numerical index.

force_common_ancestry: bool, optional

How should columns that definitively share no common ancestry be handled?

If set to True, treat columns with no common ancestry as if they shared a common ancestor immediately before the genesis of the lineages. If set to False, columns within population that definitively do not share common ancestry will raise a ValueError.

progress_wrapCallable, default identity function

Pass tqdm or equivalent to display progress bars.

Returns

pd.DataFrame

The reconstructed phylogenetic tree in alife standard format.

Raises

ValueError

If the specified version_pin is higher than the current version of hstrat.

class AssignDestructionTimeYoungestPlusOneTriePostprocessor[source]

Functor to assign a destruction time property to trie nodes.

Destruction time of leaf nodes are set to infinity. Destruction time of inner nodes is calculated as the minimum of its children’s origin times plus one.

__init__(assigned_property: str = 'destruction_time', origin_time_property: str = 'origin_time') None[source]

Initialize functor instance.

Parameters

assigned_propertystr, default “destruction_time”

The property name for the assigned destruction tim.

origin_time_propertystr, default “origin_time”

The property name for the node’s origin time.

__call__(trie: ~hstrat.phylogenetic_inference.tree._impl._TrieInnerNode.TrieInnerNode, p_differentia_collision: float, mutate: bool = False, progress_wrap: ~typing.Callable = <function AssignDestructionTimeYoungestPlusOneTriePostprocessor.<lambda>>) TrieInnerNode[source]

Assign destruction times to trie nodes based on their childrens’ origin times.

Parameters

trieTrieInnerNode

The input trie to be postprocessed.

p_differentia_collisionfloat

The multiplicative inverse of the number of possible differentia.

Not used in the current implementation.

mutatebool, default False

Are side effects on the input argument trie allowed?

progress_wraptyping.Callable, optional

Pass tqdm or equivalent to report progress.

Returns

TrieInnerNode

The postprocessed trie with assigned destruction times.

class AssignOriginTimeExpectedValueTriePostprocessor[source]

Functor to assign origin time property to trie nodes using expected values over the distribution of possible differentia collisions.

Computes the origin time of trie nodes based on the expected value, taking into account the probability of differentia collision and the prior distribution expected for ancestor origin times.

__init__(prior: PriorBase, assigned_property: str = 'origin_time') None[source]

Initialize functor instance.

Parameters

priorPriorBase

The prior distribution used to calculate expected values.

assigned_propertystr, default “origin_time”

The property name for the assigned origin time.

__call__(trie: ~hstrat.phylogenetic_inference.tree._impl._TrieInnerNode.TrieInnerNode, p_differentia_collision: float, mutate: bool = False, progress_wrap: ~typing.Callable = <function AssignOriginTimeExpectedValueTriePostprocessor.<lambda>>) TrieInnerNode[source]

Assign origin times to trie nodes.

Parameters

trieTrieInnerNode

The input trie to be postprocessed.

p_differentia_collisionfloat

Probability of a randomly-generated differentia matching an existing differentia.

mutatebool, default False

Are side effects on the input argument trie allowed?

progress_wraptyping.Callable, optional

Pass tqdm or equivalent to report progress.

Returns

TrieInnerNode

The postprocessed trie with assigned origin times.

class AssignOriginTimeNaiveTriePostprocessor[source]

Functor to assign origin time property to trie nodes calculated as the average of the node’s rank and the minimum rank among its children.

Optionally calculates origin time expected value over this interval for a provided prior distribution.

__init__(prior: ~hstrat.phylogenetic_inference.priors._detail._PriorBase.PriorBase = <hstrat.phylogenetic_inference.priors.ArbitraryPrior object>, assigned_property: str = 'origin_time') None[source]

Initialize functor instance.

Parameters

priorPriorBase, default ArbitraryPrior()

Prior distribution of ancestor origin times.

Used to calculate interval means.

assigned_propertystr, default “origin_time”

The property name for the assigned origin time.

__call__(trie: ~hstrat.phylogenetic_inference.tree._impl._TrieInnerNode.TrieInnerNode, p_differentia_collision: float, mutate: bool = False, progress_wrap: ~typing.Callable = <function AssignOriginTimeNaiveTriePostprocessor.<lambda>>) TrieInnerNode[source]

Assign origin times to trie nodes.

Parameters

trieTrieInnerNode

The input trie to be postprocessed.

p_differentia_collisionfloat

Probability of a randomly-generated differentia matching an existing differentia.

Not used in the current implementation.

mutatebool, default False

Are side effects on the input argument trie allowed?

progress_wraptyping.Callable, optional

Pass tqdm or equivalent to report progress.

Returns

TrieInnerNode

The postprocessed trie with assigned origin times.

class AssignOriginTimeNodeRankTriePostprocessor[source]

Functor to assign trie nodes’ rank as their the origin time.

__init__(assigned_property: str = 'origin_time') None[source]

Initialize functor instance.

Parameters

assigned_propertystr, default “origin_time”

The property name for the assigned origin time.

__call__(trie: ~hstrat.phylogenetic_inference.tree._impl._TrieInnerNode.TrieInnerNode, p_differentia_collision: float, mutate: bool = False, progress_wrap: ~typing.Callable = <function AssignOriginTimeNodeRankTriePostprocessor.<lambda>>) TrieInnerNode[source]

Assign origin times to trie nodes.

Parameters

trieTrieInnerNode

The input trie to be postprocessed.

p_differentia_collisionfloat

Probability of a randomly-generated differentia matching an existing differentia. Not used in the current implementation.

mutatebool, default False

Are side effects on the input argument trie allowed?

progress_wraptyping.Callable, optional

Pass tqdm or equivalent to report progress.

Returns

TrieInnerNode

The postprocessed trie with assigned origin times.

class AssignOriginTimeSampleNaiveTriePostprocessor[source]

Functor to assign origin time property to trie nodes sampled between the node’s rank and the minimum rank among its children.

A prior may be provided to customize sampling distribution used.

__init__(prior: ~hstrat.phylogenetic_inference.priors._detail._PriorBase.PriorBase = <hstrat.phylogenetic_inference.priors.ArbitraryPrior object>, assigned_property: str = 'origin_time') None[source]

Initialize functor instance.

Parameters

priorPriorBase, default ArbitraryPrior()

Prior distribution of ancestor origin times.

Used to calculate interval samples.

assigned_propertystr, default “origin_time”

The property name for the assigned origin time.

__call__(trie: ~hstrat.phylogenetic_inference.tree._impl._TrieInnerNode.TrieInnerNode, p_differentia_collision: float, mutate: bool = False, progress_wrap: ~typing.Callable = <function AssignOriginTimeSampleNaiveTriePostprocessor.<lambda>>) TrieInnerNode[source]

Assign origin times to trie nodes.

Parameters

trieTrieInnerNode

The input trie to be postprocessed.

p_differentia_collisionfloat

Probability of a randomly-generated differentia matching an existing differentia.

Not used in the current implementation.

mutatebool, default False

Are side effects on the input argument trie allowed?

progress_wraptyping.Callable, optional

Pass tqdm or equivalent to report progress.

Returns

TrieInnerNode

The postprocessed trie with assigned origin times.

class CompoundTriePostprocessor[source]

Functor to sequentially apply multiple trie postprocessors.

Allows for the combination and sequential application of multiple trie postprocessors.

__init__(postprocessors: Iterable[TriePostprocessorBase]) None[source]

Initialize functor instance.

Parameters

postprocessorstyping.Iterable[TriePostprocessorBase]

The sequence of postprocess functors to be applied.

__call__(trie: ~hstrat.phylogenetic_inference.tree._impl._TrieInnerNode.TrieInnerNode, p_differentia_collision: float, mutate: bool = False, progress_wrap: ~typing.Callable = <function CompoundTriePostprocessor.<lambda>>) TrieInnerNode[source]

Apply stored postprocessors in sequence.

Parameters

trieTrieInnerNode

The input trie to be postprocessed.

p_differentia_collisionfloat

Probability of a randomly-generated differentia matching an existing differentia.

Forwarded to the postprocess functors.

mutatebool, default False

Are side effects on the input argument trie allowed?

progress_wraptyping.Callable, optional

Pass tqdm or equivalent to report progress.

Returns

TrieInnerNode

The postprocessed trie after applying all stored postprocessors.

class PeelBackConjoinedLeavesTriePostprocessor[source]

Functor to separate any TrieLeafNode instances that are direct siblings.

Corrects for guaranteed-spurious differentia collisions among most-recent strata.

__call__(trie: ~hstrat.phylogenetic_inference.tree._impl._TrieInnerNode.TrieInnerNode, p_differentia_collision: float, mutate: bool = False, progress_wrap: ~typing.Callable = <function PeelBackConjoinedLeavesTriePostprocessor.<lambda>>) TrieInnerNode[source]

Peel apart any TrieLeafNode nodes that are direct siblings.

Without reproduction dynamics that allow columns to be cloned without stratum deposit, two hereditary stratigraphic columns can only share their most-recent strata by collision.

Clones the sibling leaves’ parent node and attaches it to the the siblings’ grandparent node. One sibling node is then grafted away from its original parent and attached onto the newly cloned parent node as a child. The original parent node is left in place with any remaining children. This process is repeated until no TrieLeafNode’s remain as direct siblings.

Parameters:

trieTrieInnerNode

The root node of the trie to be unzipped.

p_differentia_collisionfloat

The multiplicative inverse of the number of possible differentia.

This fraction of possible rollbacks are performed.

mutatebool, default False

Are side effects on the input argument trie allowed?

progress_wraptyping.Callable, optional

Pass tqdm or equivalent to report progress.

Returns

TrieInnerNode

The postprocessed trie.

class SampleAncestralRollbacksTriePostprocessor[source]

Functor to correct for systematic overestimation of relatedness by sampling a compensatory adjustment to trie topology.

__init__(seed: int | None = None) None[source]

Initialize functor instance.

Parameters:

seed: int, default

Controls sampling decisions in the algorithm.

Pass an int for reproducible output across multiple function calls. The default value, 1, ensures reproducible output. Pass None to use existing RNG context directly.

__call__(trie: ~hstrat.phylogenetic_inference.tree._impl._TrieInnerNode.TrieInnerNode, p_differentia_collision: float, mutate: bool = False, progress_wrap: ~typing.Callable = <function SampleAncestralRollbacksTriePostprocessor.<lambda>>) TrieInnerNode[source]

Compensate for bias towards overestimating relatedness due to spurious differentia collisions.

Each rollback operation alters the tree as if a single spurious collision had occured; a single branch is adjusted to exhibit the next- most-ancient last commonality.

The number of rollback operations is calculated from the number of possible spurious collisions and the probability of spurious collision. Unzip targets are sampled randomly using the standard library random module.

Parameters:

trieTrieInnerNode

The root node of the trie to be unzipped.

p_differentia_collisionfloat

The multiplicative inverse of the number of possible differentia.

This fraction of possible rollbacks are performed.

mutatebool, default False

Are side effects on the input argument trie allowed?

progress_wraptyping.Callable, optional

Pass tqdm or equivalent to report progress.

Returns

TrieInnerNode

The postprocessed trie.

Notes:

This function assumes underlying shared genesis, so the root node of the trie is not eligible for rollback.