surface_test_drive
- surface_test_drive(df: ~polars.lazyframe.frame.LazyFrame, *, dstream_algo: str, dstream_S: int, dstream_T_bitwidth: int = 32, progress_wrap: ~typing.Callable = <function <lambda>>, stratum_differentia_bit_width: int) DataFrame
Reads alife standard phylogeny dataframe to create a population of hstrat surface annotations corresponding to the phylogeny tips, “as-if” they had evolved according to the provided phylogeny history.
Parameters
- dfpl.DataFrame
The input DataFrame containing alife standard phylogeny with required columns, one row per taxon.
Note that the alife-standard ancestor_list column is not required.
- Required schema:
- ‘id’pl.UInt64
Taxon identifier.
- ‘ancestor_id’pl.UInt64
Taxon identifier of ancestor.
Own ‘id’ if root.
- Optional schema:
- ‘origin_time’pl.UInt64
Number of generations elapsed from ancestor.
Determines branch lengths.
Otherwise, all branches are assumed to be length 1.
- ‘extant’pl.Boolean
Should an entry corresponding to this phylogeny taxon be included in the output population?
Otherwise, all tips are considered extant and all inner nodes are not.
Additional user-defined columns will be forwarded to the output DataFrame.
- dstream_algostr
Name of downstream curation algorithm to use.
- dstream_Sint
Capacity of annotation dstream buffer, in number of data items.
- progress_wrapCallable, optional
Pass tqdm or equivalent to display a progress bar.
- stratum_differentia_bit_widthint
The bit width of the generated differentia.
Returns
- pl.DataFrame
The output DataFrame containing generated hstrat surface annotations.
- Required schema:
- ‘data_hex’pl.String
Raw genome data, with serialized dstream buffer and counter.
Represented as a hexadecimal string.
- ‘downstream_version’pl.Categorical
Version of downstream library used.
- ‘dstream_algo’pl.Categorical
Name of downstream curation algorithm used.
e.g., ‘dstream.steady_algo’
- ‘dstream_storage_bitoffset’pl.UInt32
Position of dstream buffer field in ‘data_hex’.
- ‘dstream_storage_bitwidth’pl.UInt32
Size of dstream buffer field in ‘data_hex’.
- ‘dstream_T_bitoffset’pl.UInt32
Position of dstream counter field in ‘data_hex’.
- ‘dstream_T_bitwidth’pl.UInt32
Size of dstream counter field in ‘data_hex’.
- ‘dstream_S’pl.Uint32
Capacity of dstream buffer, in number of data items.
- ‘origin_time’pl.UInt64
Number of generations elapsed since the founding ancestor.
- ‘td_source_id’pl.UInt64
Corresponding taxon identifier in source phylogeny.
Additional user-defined columns will be forwarded from the input DataFrame.
Notes
Input columns “id”, “ancestor_id”, and “ancestor_list” are not forwarded to output, to avoid conflicts with the output schema for subsequent phylogeny reconstruction.