surface_test_drive

surface_test_drive(df: ~polars.lazyframe.frame.LazyFrame, *, dstream_algo: str, dstream_S: int, dstream_T_bitwidth: int = 32, progress_wrap: ~typing.Callable = <function <lambda>>, stratum_differentia_bit_width: int) DataFrame

Reads alife standard phylogeny dataframe to create a population of hstrat surface annotations corresponding to the phylogeny tips, “as-if” they had evolved according to the provided phylogeny history.

Parameters

dfpl.DataFrame

The input DataFrame containing alife standard phylogeny with required columns, one row per taxon.

Note that the alife-standard ancestor_list column is not required.

Required schema:
  • ‘id’pl.UInt64
    • Taxon identifier.

  • ‘ancestor_id’pl.UInt64
    • Taxon identifier of ancestor.

    • Own ‘id’ if root.

Optional schema:
  • ‘origin_time’pl.UInt64
    • Number of generations elapsed from ancestor.

    • Determines branch lengths.

    • Otherwise, all branches are assumed to be length 1.

  • ‘extant’pl.Boolean
    • Should an entry corresponding to this phylogeny taxon be included in the output population?

    • Otherwise, all tips are considered extant and all inner nodes are not.

  • Additional user-defined columns will be forwarded to the output DataFrame.

dstream_algostr

Name of downstream curation algorithm to use.

dstream_Sint

Capacity of annotation dstream buffer, in number of data items.

progress_wrapCallable, optional

Pass tqdm or equivalent to display a progress bar.

stratum_differentia_bit_widthint

The bit width of the generated differentia.

Returns

pl.DataFrame

The output DataFrame containing generated hstrat surface annotations.

Required schema:
  • ‘data_hex’pl.String
    • Raw genome data, with serialized dstream buffer and counter.

    • Represented as a hexadecimal string.

  • ‘downstream_version’pl.Categorical
    • Version of downstream library used.

  • ‘dstream_algo’pl.Categorical
    • Name of downstream curation algorithm used.

    • e.g., ‘dstream.steady_algo’

  • ‘dstream_storage_bitoffset’pl.UInt32
    • Position of dstream buffer field in ‘data_hex’.

  • ‘dstream_storage_bitwidth’pl.UInt32
    • Size of dstream buffer field in ‘data_hex’.

  • ‘dstream_T_bitoffset’pl.UInt32
    • Position of dstream counter field in ‘data_hex’.

  • ‘dstream_T_bitwidth’pl.UInt32
    • Size of dstream counter field in ‘data_hex’.

  • ‘dstream_S’pl.Uint32
    • Capacity of dstream buffer, in number of data items.

  • ‘origin_time’pl.UInt64
    • Number of generations elapsed since the founding ancestor.

  • ‘td_source_id’pl.UInt64
    • Corresponding taxon identifier in source phylogeny.

Additional user-defined columns will be forwarded from the input DataFrame.

Notes

  • Input columns “id”, “ancestor_id”, and “ancestor_list” are not forwarded to output, to avoid conflicts with the output schema for subsequent phylogeny reconstruction.