serialization

Tools to load and save hereditary stratigraphic columns.

Functions

assemblage_from_dstream_df(df[, progress_wrap])

Deserialize a HereditaryStratigraphicAssemblage from a pandas DataFrame containing dstream surface data.

assemblage_from_records(records[, ...])

Deserialize a population of HereditaryStratigraphicColumn`s into a `HereditaryStratigraphicAssemblage from a dict composed of builtin types.

col_from_int(value, differentia_bit_width, ...)

Deserialize a HereditaryStratigraphicColumn from an integer representation.

col_from_packet(packet, ...[, ...])

Deserialize a HereditaryStratigraphicColumn from a differentia packet and column configuration specification information.

col_from_packet_buffer(packet_buffer, ...[, ...])

Deserialize a HereditaryStratigraphicColumn from a buffer containing the differentia packet at the front, then stored differentia values.

col_from_records(records[, ...])

Deserialize a HereditaryStratigraphicColumn from a dict composed of builtin data types.

col_to_dataframe(column)

Create a pandas DataFrame with retained strata as rows.

col_to_int(column[, ...])

Serialize a HereditaryStratigraphicColumn to a binary representation as a Python int.

col_to_packet(column[, ...])

Serialize a HereditaryStratigraphicColumn to a binary buffer.

col_to_records(column)

Serialize a HereditaryStratigraphicColumn to a dict composed of builtin types.

col_to_specimen(column)

Create a postprocessing representation of the differentia retained by an extant HereditaryStratigraphicColumn, indexed by deposition rank.

pack_differentiae(strata, differentia_bit_width)

Pack a sequence of differentiae together into a compact representation.

pack_differentiae_bytes(strata, ...[, ...])

Pack a sequence of differentiae together into a compact representation.

pack_differentiae_str(strata, ...)

Pack a sequence of differentiae together into a compact string representation.

policy_from_records(records)

Deserialize a stratum retention policy from a dict composed of builtin data types.

policy_to_records(policy)

Serialize a stratum retention policy to a dict composed of builtin types.

pop_from_records(records[, progress_wrap, ...])

Deserialize a sequence of `HereditaryStratigraphicColumn`s from a dict composed of builtin types.

pop_to_assemblage(columns[, progress_wrap])

Create a postprocessing representation of the differentia retained by a collection of HereditaryStratigraphicColumns.

pop_to_dataframe(columns[, progress_wrap])

Create a pandas DataFrame summarizing several columns, with retained strata as rows.

pop_to_records(columns[, progress_wrap])

Serialize a sequence of `HereditaryStratigraphicColumn`s to a dict composed of builtin types.

specimen_from_records(records)

Deserialize a HereditaryStratigraphicSpecimen from a dict composed of builtin data types.

surf_from_hex(hex_string, dstream_algo, *, ...)

Deserialize a HereditaryStratigraphicSurface object from a hex string representation.

surf_to_hex(surface, *[, dstream_T_bitwidth])

Serialize a HereditaryStratigraphicSurface object into a hex string representation.

surf_to_specimen(surface)

Convert a HereditaryStratigraphicSurface to a HereditaryStratigraphicSpecimen.

unassemblage_from_records(records[, ...])

Deserialize a population of HereditaryStratigraphicColumn`s into a list of `HereditaryStratigraphiSpecimen from a dict composed of builtin types.

unpack_differentiae(packed_differentiae, ...)

Unpack a compact, concatenated base 64 representation into a sequence with each element represented as a distinct integer.

unpack_differentiae_bytes(...[, ...])

Unpack a compact, concatenated byte buffer representation into a sequence with each element represented as a distinct integer.

unpack_differentiae_str(packed_differentiae, ...)

Unpack a compact, concatenated base 64 representation into a sequence with each element represented as a distinct integer.

assemblage_from_dstream_df(df: ~pandas.core.frame.DataFrame, progress_wrap: ~typing.Callable = <function <lambda>>) HereditaryStratigraphicAssemblage

Deserialize a HereditaryStratigraphicAssemblage from a pandas DataFrame containing dstream surface data.

Each row of the DataFrame represents a single hereditary stratigraphic surface, serialized as a hex string with associated dstream metadata. Surfaces are deserialized, converted to specimens, and assembled into a HereditaryStratigraphicAssemblage.

Parameters

dfpd.DataFrame

DataFrame with dstream surface data.

Required schema:
  • ‘data_hex’string

    Raw genome data as a hexadecimal string.

  • ‘dstream_algo’string or categorical

    Name of downstream curation algorithm (e.g., 'dstream.steady_algo').

  • ‘dstream_storage_bitoffset’integer

    Bit offset of the dstream buffer field in data_hex.

  • ‘dstream_storage_bitwidth’integer

    Bit width of the dstream buffer field in data_hex.

  • ‘dstream_T_bitoffset’integer

    Bit offset of the dstream counter (“rank”) field in data_hex.

  • ‘dstream_T_bitwidth’integer

    Bit width of the dstream counter field in data_hex.

  • ‘dstream_S’integer

    Capacity of the dstream buffer (number of differentia stored per annotation).

progress_wrapcallable, optional

Wrapper applied to the row iterator, e.g. tqdm.tqdm for a progress bar. Must accept and return an iterable. Default is the identity function (no wrapping).

Returns

HereditaryStratigraphicAssemblage

Assemblage built from the deserialized surfaces.

Raises

ValueError

If any required column is missing from the DataFrame.

See Also

surf_from_hex :

Deserialize a single surface from a hex string.

pop_to_assemblage :

Create an assemblage from a collection of HereditaryStratigraphicColumn objects.

assemblage_from_records :

Deserialize an assemblage from a dict of builtin types.

assemblage_from_records(records: ~typing.Dict, progress_wrap: ~typing.Callable = <function <lambda>>, mutate: bool = False) HereditaryStratigraphicAssemblage

Deserialize a population of HereditaryStratigraphicColumn`s into a `HereditaryStratigraphicAssemblage from a dict composed of builtin types.

Parameters

recordsdict

Data to deserialize.

progress_wrapCallable, optional

Wrapper applied around generation iterator and row generator for final phylogeny compilation process.

Pass tqdm or equivalent to display progress bars.

mutatebool, default False

Are side effects on the input argument records allowed?

See Also

HereditaryStratigraphicAssemblage

A collection of HereditaryStratigraphicSpecimens, padded to include entries for all ranks retained by any specimen within the assemblage.

col_from_int(value: int, differentia_bit_width: int, stratum_retention_policy: Callable, differentiae_byte_bit_order: Literal['big', 'little'] = 'big', num_strata_deposited_byte_order: Literal['big', 'little'] = 'big', num_strata_deposited_byte_width: int = 4, value_byte_width: int | None = None) HereditaryStratigraphicColumn

Deserialize a HereditaryStratigraphicColumn from an integer representation.

Integer representation is packet binary representation plus a sentry bit at the most significant bit position. Sentry bit prevents loss of leading zero bits. If value_byte_width is not None, an appropriate sentry bit is added to the value if it is not already present.

Assumes big endian byte order.

col_from_packet(packet: Buffer, differentia_bit_width: int, stratum_retention_policy: Callable, differentiae_byte_bit_order: Literal['big', 'little'] = 'big', num_strata_deposited_byte_order: Literal['big', 'little'] = 'big', num_strata_deposited_byte_width: int = 4) HereditaryStratigraphicColumn

Deserialize a HereditaryStratigraphicColumn from a differentia packet and column configuration specification information.

Use when buffer size equals packet size.

See Also

col_from_packet_buffer: use when buffer size exceeds packet size.

col_from_packet_buffer(packet_buffer: Buffer, differentia_bit_width: int, stratum_retention_policy: Callable, differentiae_byte_bit_order: Literal['big', 'little'] = 'big', num_strata_deposited_byte_order: Literal['big', 'little'] = 'big', num_strata_deposited_byte_width: int = 4) HereditaryStratigraphicColumn

Deserialize a HereditaryStratigraphicColumn from a buffer containing the differentia packet at the front, then stored differentia values.

Use when buffer size exceeds packet size.

See Also

col_from_packet: use when buffer size equals packet size.

col_from_records(records: Dict, differentiae_byte_bit_order: Literal['big', 'little'] = 'big') HereditaryStratigraphicColumn

Deserialize a HereditaryStratigraphicColumn from a dict composed of builtin data types.

col_to_dataframe(column: HereditaryStratigraphicColumn) DataFrame

Create a pandas DataFrame with retained strata as rows.

col_to_int(column: HereditaryStratigraphicColumn, num_strata_deposited_byte_order: Literal['big', 'little'] = 'big', num_strata_deposited_byte_width: int = 4) int

Serialize a HereditaryStratigraphicColumn to a binary representation as a Python int.

Integer representation is packet binary representation plus a sentry bit at the most significant bit position, to prevent loss of leading zero bits.

Uses big endian byte order.

col_to_packet(column: HereditaryStratigraphicColumn, num_strata_deposited_byte_order: Literal['big', 'little'] = 'big', num_strata_deposited_byte_width: int = 4) Buffer

Serialize a HereditaryStratigraphicColumn to a binary buffer.

col_to_records(column: HereditaryStratigraphicColumn) Dict

Serialize a HereditaryStratigraphicColumn to a dict composed of builtin types.

col_to_specimen(column: HereditaryStratigraphicColumn) HereditaryStratigraphicSpecimen

Create a postprocessing representation of the differentia retained by an extant HereditaryStratigraphicColumn, indexed by deposition rank.

pack_differentiae(strata: Iterable[HereditaryStratum], differentia_bit_width: int) str

Pack a sequence of differentiae together into a compact representation.

Returns a string with base 64 encoded concatenation of diffferentiae. If differentia_bit_width is not an even byte multiple, the first encoded byte tells how many empty padding bits, if any, were placed at the end of the concatenation in order to align the bitstring end to byte boundaries.

Deprecated since version 1.8.0: Use pack_differentiae_str instead.

pack_differentiae_bytes(strata: Iterable[HereditaryStratum], differentia_bit_width: int, always_omit_num_padding_bits_header: bool = False) Buffer

Pack a sequence of differentiae together into a compact representation.

Returns a byte buffer concatenation of diffferentiae. If differentia_bit_width is not an even byte multiple, the first encoded byte tells how many empty padding bits, if any, were placed at the end of the concatenation in order to align the bitstring end to byte boundaries.

pack_differentiae_str(strata: Iterable[HereditaryStratum], differentia_bit_width: int) str

Pack a sequence of differentiae together into a compact string representation.

Returns a string with base 64 encoded concatenation of diffferentiae. If differentia_bit_width is not an even byte multiple, the first encoded byte tells how many empty padding bits, if any, were placed at the end of the concatenation in order to align the bitstring end to byte boundaries.

policy_from_records(records: Dict) Callable

Deserialize a stratum retention policy from a dict composed of builtin data types.

policy_to_records(policy: Callable) Dict

Serialize a stratum retention policy to a dict composed of builtin types.

pop_from_records(records: ~typing.Dict, progress_wrap: ~typing.Callable = <function <lambda>>, mutate: bool = False) List[HereditaryStratigraphicColumn]

Deserialize a sequence of `HereditaryStratigraphicColumn`s from a dict composed of builtin types.

Parameters

recordsdict

Data to deserialize.

progress_wrapCallable, optional

Wrapper applied around generation iterator and row generator for final phylogeny compilation process.

Pass tqdm or equivalent to display progress bars.

mutatebool, default False

Are side effects on the input argument records allowed?

Returns

populationList[HereditaryStratigraphicColumn]

Deserialized population of HereditaryStratigraphicColumns.

pop_to_assemblage(columns: ~typing.Iterable[~hstrat.genome_instrumentation.HereditaryStratigraphicColumn], progress_wrap: ~typing.Callable = <function <lambda>>) HereditaryStratigraphicAssemblage

Create a postprocessing representation of the differentia retained by a collection of HereditaryStratigraphicColumns.

Parameters

columnsiterable of HereditaryStratigraphicColumn

Data to serialize.

progress_wrapCallable, optional

Wrapper applied around generation iterator and row generator for final phylogeny compilation process.

Pass tqdm or equivalent to display progress bars.

pop_to_dataframe(columns: ~typing.Iterable[~hstrat.genome_instrumentation.HereditaryStratigraphicColumn], progress_wrap: ~typing.Callable = <function <lambda>>) DataFrame

Create a pandas DataFrame summarizing several columns, with retained strata as rows.

Parameters

columnsiterable of HereditaryStratigraphicColumn

Data to serialize.

progress_wrapCallable, optional

Wrapper applied around generation iterator and row generator for final phylogeny compilation process.

Pass tqdm or equivalent to display progress bars.

pop_to_records(columns: ~typing.Iterable[~hstrat.genome_instrumentation.HereditaryStratigraphicColumn], progress_wrap: ~typing.Callable = <function <lambda>>) Dict

Serialize a sequence of `HereditaryStratigraphicColumn`s to a dict composed of builtin types.

Parameters

columnsiterable of HereditaryStratigraphicColumn

Data to serialize.

progress_wrapCallable, default identity function

Wrapper applied around generation iterator and row generator for final phylogeny compilation process.

Pass tqdm or equivalent to display progress bars.

specimen_from_records(records: Dict) HereditaryStratigraphicSpecimen

Deserialize a HereditaryStratigraphicSpecimen from a dict composed of builtin data types.

See Also

HereditaryStratigraphicSpecimen

Postprocessing representation of the differentia retained by an extant HereditaryStratigraphicColumn, indexed by deposition rank.

surf_from_hex(hex_string: str, dstream_algo: ModuleType, *, dstream_S: int, dstream_storage_bitoffset: int | None = None, dstream_storage_bitwidth: int, dstream_T_bitoffset: int = 0, dstream_T_bitwidth: int = 32) HereditaryStratigraphicSurface

Deserialize a HereditaryStratigraphicSurface object from a hex string representation.

Hex string representation needs exactly two contiguous parts: 1. dstream_T (which is the number of depositions elapsed), and 2. dstream_storage (which holds all the stored differentiae).

Data in hex string representation should use big-endian byte order.

Parameters

hex_string: str

Hex string to be parsed, which can be uppercase or lowercase.

dstream_algo: module

Dstream algorithm for curation of retained differentia.

dstream_storage_bitoffset: int, default dstream_T_bitwidth

Number of bits before the storage.

dstream_storage_bitwidth: int

Number of bits used for storage.

dstream_T_bitoffset: int, default 0

Number of bits before dstream_T.

dstream_T_bitwidth: int, default 32

Number of bits used to store dstream_T.

dstream_S: int

Number of buffer sites available to store differentiae.

Determines how many differentiae are unpacked from storage.

See Also

surf_to_hex()

Serialize a surface into a hex string.

surf_to_hex(surface: HereditaryStratigraphicSurface, *, dstream_T_bitwidth: int = 32) str

Serialize a HereditaryStratigraphicSurface object into a hex string representation.

Serialized data comprises two components:
  1. dstream_T (the number of depositions elapsed) and

  2. dstream_storage (binary data of differentia values).

The hex layout used is:

0x…

########************************************************** ^ ^

dstream_T, length = dstream_T_bitwidth / 4 |

dstream_storage, length = item_bitwidth / 4 * dstream_S

This hex string can be reconstituted into a HereditaryStratigraphicSurface object by calling HereditaryStratigraphicSurface.from_hex() with the following parameters:

  • dstream_T_bitoffset = 0

  • dstream_T_bitwidth = dstream_T_bitwidth

  • dstream_storage_bitoffset = dstream_T_bitwidth

  • dstream_storage_bitwidth = self.S * item_bitwidth

Parameters

item_bitwidth: int

Number of storage bits used per differentia.

dstream_T_bitwidth: int, default 32

Number of bits used to store count of elapsed depositions.

See Also

surf_from_hex()

Deserialize a surface from a hex string.

surf_to_specimen(surface: HereditaryStratigraphicSurface) HereditaryStratigraphicSpecimen

Convert a HereditaryStratigraphicSurface to a HereditaryStratigraphicSpecimen.

Parameters

surfaceHereditaryStratigraphicSurface

The surface to convert.

Returns

HereditaryStratigraphicSpecimen

Specimen with differentia indexed by retained ranks.

See Also

col_to_specimen :

Convert a HereditaryStratigraphicColumn to a specimen.

unassemblage_from_records(records: ~typing.Dict, progress_wrap: ~typing.Callable = <function <lambda>>, mutate: bool = False) List[HereditaryStratigraphicSpecimen]

Deserialize a population of HereditaryStratigraphicColumn`s into a list of `HereditaryStratigraphiSpecimen from a dict composed of builtin types.

Parameters

recordsdict

Data to deserialize.

progress_wrapCallable, optional

Wrapper applied around generation iterator and row generator for final phylogeny compilation process.

Pass tqdm or equivalent to display progress bars.

mutatebool, default False

Are side effects on the input argument records allowed?

See Also

HereditaryStratigraphicSpecimen

Postprocessing representation of the differentia retained by an extant HereditaryStratigraphicColumn, indexed by deposition rank.

HereditaryStratigraphicAssemblage

A collection of HereditaryStratigraphicSpecimens, padded to include entries for all ranks retained by any specimen within the assemblage.

unpack_differentiae(packed_differentiae: str, differentia_bit_width: int) Iterator[int]

Unpack a compact, concatenated base 64 representation into a sequence with each element represented as a distinct integer.

Deprecated since version 1.8.0: Use unpack_differentiae_str instead.

unpack_differentiae_bytes(packed_differentiae: Buffer, differentia_bit_width: int, differentiae_byte_bit_order: Literal['big', 'little'] = 'big', num_packed_differentia: int | None = None) Iterator[int]

Unpack a compact, concatenated byte buffer representation into a sequence with each element represented as a distinct integer.

unpack_differentiae_str(packed_differentiae: str, differentia_bit_width: int, differentiae_byte_bit_order: Literal['big', 'little'] = 'big', num_packed_differentia: int | None = None) Iterator[int]

Unpack a compact, concatenated base 64 representation into a sequence with each element represented as a distinct integer.

Notes

Specifying num_packed_differentia implies that packed_differentiae has no header specifying num padding bits.