surface_test_drive

surface_test_drive(df: ~polars.lazyframe.frame.LazyFrame, *, dstream_algo: str, dstream_S: int, dstream_T_bitwidth: int = 32, progress_wrap: ~typing.Callable = <function <lambda>>, stratum_differentia_bit_width: int) → DataFrame

Reads alife standard phylogeny dataframe to create a population of hstrat surface annotations corresponding to the phylogeny tips, “as-if” they had evolved according to the provided phylogeny history.

Parameters

dfpl.DataFrame

The input DataFrame containing alife standard phylogeny with required columns, one row per taxon.

Note that the alife-standard ancestor_list column is not required.

Required schema:

‘id’pl.UInt64
- Taxon identifier.
‘ancestor_id’pl.UInt64
- Taxon identifier of ancestor.
- Own ‘id’ if root.

Optional schema:

‘origin_time’pl.UInt64
- Number of generations elapsed from ancestor.
- Determines branch lengths.
- Otherwise, all branches are assumed to be length 1.
‘extant’pl.Boolean
- Should an entry corresponding to this phylogeny taxon be included in the output population?
- Otherwise, all tips are considered extant and all inner nodes are not.
Additional user-defined columns will be forwarded to the output DataFrame.

dstream_algostr

Name of downstream curation algorithm to use.

dstream_Sint

Capacity of annotation dstream buffer, in number of data items.

progress_wrapCallable, optional

Pass tqdm or equivalent to display a progress bar.

stratum_differentia_bit_widthint

The bit width of the generated differentia.

Returns

pl.DataFrame

The output DataFrame containing generated hstrat surface annotations.

Required schema:

‘data_hex’pl.String
- Raw genome data, with serialized dstream buffer and counter.
- Represented as a hexadecimal string.
‘downstream_version’pl.Categorical
- Version of downstream library used.
‘dstream_algo’pl.Categorical
- Name of downstream curation algorithm used.
- e.g., ‘dstream.steady_algo’
‘dstream_storage_bitoffset’pl.UInt32
- Position of dstream buffer field in ‘data_hex’.
‘dstream_storage_bitwidth’pl.UInt32
- Size of dstream buffer field in ‘data_hex’.
‘dstream_T_bitoffset’pl.UInt32
- Position of dstream counter field in ‘data_hex’.
‘dstream_T_bitwidth’pl.UInt32
- Size of dstream counter field in ‘data_hex’.
‘dstream_S’pl.Uint32
- Capacity of dstream buffer, in number of data items.
‘origin_time’pl.UInt64
- Number of generations elapsed since the founding ancestor.
‘td_source_id’pl.UInt64
- Corresponding taxon identifier in source phylogeny.

Additional user-defined columns will be forwarded from the input DataFrame.

Notes

Input columns “id”, “ancestor_id”, and “ancestor_list” are not forwarded to output, to avoid conflicts with the output schema for subsequent phylogeny reconstruction.