serialization
Tools to load and save hereditary stratigraphic columns.
Functions
|
Deserialize a HereditaryStratigraphicAssemblage from a pandas DataFrame containing dstream surface data. |
|
Deserialize a population of HereditaryStratigraphicColumn`s into a `HereditaryStratigraphicAssemblage from a dict composed of builtin types. |
|
Deserialize a HereditaryStratigraphicColumn from an integer representation. |
|
Deserialize a HereditaryStratigraphicColumn from a differentia packet and column configuration specification information. |
|
Deserialize a HereditaryStratigraphicColumn from a buffer containing the differentia packet at the front, then stored differentia values. |
|
Deserialize a HereditaryStratigraphicColumn from a dict composed of builtin data types. |
|
Create a pandas DataFrame with retained strata as rows. |
|
Serialize a HereditaryStratigraphicColumn to a binary representation as a Python int. |
|
Serialize a HereditaryStratigraphicColumn to a binary buffer. |
|
Serialize a HereditaryStratigraphicColumn to a dict composed of builtin types. |
|
Create a postprocessing representation of the differentia retained by an extant HereditaryStratigraphicColumn, indexed by deposition rank. |
|
Pack a sequence of differentiae together into a compact representation. |
|
Pack a sequence of differentiae together into a compact representation. |
|
Pack a sequence of differentiae together into a compact string representation. |
|
Deserialize a stratum retention policy from a dict composed of builtin data types. |
|
Serialize a stratum retention policy to a dict composed of builtin types. |
|
Deserialize a sequence of `HereditaryStratigraphicColumn`s from a dict composed of builtin types. |
|
Create a postprocessing representation of the differentia retained by a collection of HereditaryStratigraphicColumns. |
|
Create a pandas DataFrame summarizing several columns, with retained strata as rows. |
|
Serialize a sequence of `HereditaryStratigraphicColumn`s to a dict composed of builtin types. |
|
Deserialize a HereditaryStratigraphicSpecimen from a dict composed of builtin data types. |
|
Deserialize a HereditaryStratigraphicSurface object from a hex string representation. |
|
Serialize a HereditaryStratigraphicSurface object into a hex string representation. |
|
Convert a HereditaryStratigraphicSurface to a HereditaryStratigraphicSpecimen. |
|
Deserialize a population of HereditaryStratigraphicColumn`s into a list of `HereditaryStratigraphiSpecimen from a dict composed of builtin types. |
|
Unpack a compact, concatenated base 64 representation into a sequence with each element represented as a distinct integer. |
|
Unpack a compact, concatenated byte buffer representation into a sequence with each element represented as a distinct integer. |
|
Unpack a compact, concatenated base 64 representation into a sequence with each element represented as a distinct integer. |
- assemblage_from_dstream_df(df: ~pandas.core.frame.DataFrame, progress_wrap: ~typing.Callable = <function <lambda>>) HereditaryStratigraphicAssemblage
Deserialize a HereditaryStratigraphicAssemblage from a pandas DataFrame containing dstream surface data.
Each row of the DataFrame represents a single hereditary stratigraphic surface, serialized as a hex string with associated dstream metadata. Surfaces are deserialized, converted to specimens, and assembled into a HereditaryStratigraphicAssemblage.
Parameters
- dfpd.DataFrame
DataFrame with dstream surface data.
- Required schema:
- ‘data_hex’string
Raw genome data as a hexadecimal string.
- ‘dstream_algo’string or categorical
Name of downstream curation algorithm (e.g.,
'dstream.steady_algo').
- ‘dstream_storage_bitoffset’integer
Bit offset of the dstream buffer field in
data_hex.
- ‘dstream_storage_bitwidth’integer
Bit width of the dstream buffer field in
data_hex.
- ‘dstream_T_bitoffset’integer
Bit offset of the dstream counter (“rank”) field in
data_hex.
- ‘dstream_T_bitwidth’integer
Bit width of the dstream counter field in
data_hex.
- ‘dstream_S’integer
Capacity of the dstream buffer (number of differentia stored per annotation).
- progress_wrapcallable, optional
Wrapper applied to the row iterator, e.g.
tqdm.tqdmfor a progress bar. Must accept and return an iterable. Default is the identity function (no wrapping).
Returns
- HereditaryStratigraphicAssemblage
Assemblage built from the deserialized surfaces.
Raises
- ValueError
If any required column is missing from the DataFrame.
See Also
- surf_from_hex :
Deserialize a single surface from a hex string.
- pop_to_assemblage :
Create an assemblage from a collection of
HereditaryStratigraphicColumnobjects.- assemblage_from_records :
Deserialize an assemblage from a dict of builtin types.
- assemblage_from_records(records: ~typing.Dict, progress_wrap: ~typing.Callable = <function <lambda>>, mutate: bool = False) HereditaryStratigraphicAssemblage
Deserialize a population of HereditaryStratigraphicColumn`s into a `HereditaryStratigraphicAssemblage from a dict composed of builtin types.
Parameters
- recordsdict
Data to deserialize.
- progress_wrapCallable, optional
Wrapper applied around generation iterator and row generator for final phylogeny compilation process.
Pass tqdm or equivalent to display progress bars.
- mutatebool, default False
Are side effects on the input argument records allowed?
See Also
- HereditaryStratigraphicAssemblage
A collection of HereditaryStratigraphicSpecimens, padded to include entries for all ranks retained by any specimen within the assemblage.
- col_from_int(value: int, differentia_bit_width: int, stratum_retention_policy: Callable, differentiae_byte_bit_order: Literal['big', 'little'] = 'big', num_strata_deposited_byte_order: Literal['big', 'little'] = 'big', num_strata_deposited_byte_width: int = 4, value_byte_width: int | None = None) HereditaryStratigraphicColumn
Deserialize a HereditaryStratigraphicColumn from an integer representation.
Integer representation is packet binary representation plus a sentry bit at the most significant bit position. Sentry bit prevents loss of leading zero bits. If value_byte_width is not None, an appropriate sentry bit is added to the value if it is not already present.
Assumes big endian byte order.
- col_from_packet(packet: Buffer, differentia_bit_width: int, stratum_retention_policy: Callable, differentiae_byte_bit_order: Literal['big', 'little'] = 'big', num_strata_deposited_byte_order: Literal['big', 'little'] = 'big', num_strata_deposited_byte_width: int = 4) HereditaryStratigraphicColumn
Deserialize a HereditaryStratigraphicColumn from a differentia packet and column configuration specification information.
Use when buffer size equals packet size.
See Also
col_from_packet_buffer: use when buffer size exceeds packet size.
- col_from_packet_buffer(packet_buffer: Buffer, differentia_bit_width: int, stratum_retention_policy: Callable, differentiae_byte_bit_order: Literal['big', 'little'] = 'big', num_strata_deposited_byte_order: Literal['big', 'little'] = 'big', num_strata_deposited_byte_width: int = 4) HereditaryStratigraphicColumn
Deserialize a HereditaryStratigraphicColumn from a buffer containing the differentia packet at the front, then stored differentia values.
Use when buffer size exceeds packet size.
See Also
col_from_packet: use when buffer size equals packet size.
- col_from_records(records: Dict, differentiae_byte_bit_order: Literal['big', 'little'] = 'big') HereditaryStratigraphicColumn
Deserialize a HereditaryStratigraphicColumn from a dict composed of builtin data types.
- col_to_dataframe(column: HereditaryStratigraphicColumn) DataFrame
Create a pandas DataFrame with retained strata as rows.
- col_to_int(column: HereditaryStratigraphicColumn, num_strata_deposited_byte_order: Literal['big', 'little'] = 'big', num_strata_deposited_byte_width: int = 4) int
Serialize a HereditaryStratigraphicColumn to a binary representation as a Python int.
Integer representation is packet binary representation plus a sentry bit at the most significant bit position, to prevent loss of leading zero bits.
Uses big endian byte order.
- col_to_packet(column: HereditaryStratigraphicColumn, num_strata_deposited_byte_order: Literal['big', 'little'] = 'big', num_strata_deposited_byte_width: int = 4) Buffer
Serialize a HereditaryStratigraphicColumn to a binary buffer.
- col_to_records(column: HereditaryStratigraphicColumn) Dict
Serialize a HereditaryStratigraphicColumn to a dict composed of builtin types.
- col_to_specimen(column: HereditaryStratigraphicColumn) HereditaryStratigraphicSpecimen
Create a postprocessing representation of the differentia retained by an extant HereditaryStratigraphicColumn, indexed by deposition rank.
- pack_differentiae(strata: Iterable[HereditaryStratum], differentia_bit_width: int) str
Pack a sequence of differentiae together into a compact representation.
Returns a string with base 64 encoded concatenation of diffferentiae. If differentia_bit_width is not an even byte multiple, the first encoded byte tells how many empty padding bits, if any, were placed at the end of the concatenation in order to align the bitstring end to byte boundaries.
Deprecated since version 1.8.0: Use pack_differentiae_str instead.
- pack_differentiae_bytes(strata: Iterable[HereditaryStratum], differentia_bit_width: int, always_omit_num_padding_bits_header: bool = False) Buffer
Pack a sequence of differentiae together into a compact representation.
Returns a byte buffer concatenation of diffferentiae. If differentia_bit_width is not an even byte multiple, the first encoded byte tells how many empty padding bits, if any, were placed at the end of the concatenation in order to align the bitstring end to byte boundaries.
- pack_differentiae_str(strata: Iterable[HereditaryStratum], differentia_bit_width: int) str
Pack a sequence of differentiae together into a compact string representation.
Returns a string with base 64 encoded concatenation of diffferentiae. If differentia_bit_width is not an even byte multiple, the first encoded byte tells how many empty padding bits, if any, were placed at the end of the concatenation in order to align the bitstring end to byte boundaries.
- policy_from_records(records: Dict) Callable
Deserialize a stratum retention policy from a dict composed of builtin data types.
- policy_to_records(policy: Callable) Dict
Serialize a stratum retention policy to a dict composed of builtin types.
- pop_from_records(records: ~typing.Dict, progress_wrap: ~typing.Callable = <function <lambda>>, mutate: bool = False) List[HereditaryStratigraphicColumn]
Deserialize a sequence of `HereditaryStratigraphicColumn`s from a dict composed of builtin types.
Parameters
- recordsdict
Data to deserialize.
- progress_wrapCallable, optional
Wrapper applied around generation iterator and row generator for final phylogeny compilation process.
Pass tqdm or equivalent to display progress bars.
- mutatebool, default False
Are side effects on the input argument records allowed?
Returns
- populationList[HereditaryStratigraphicColumn]
Deserialized population of HereditaryStratigraphicColumns.
- pop_to_assemblage(columns: ~typing.Iterable[~hstrat.genome_instrumentation.HereditaryStratigraphicColumn], progress_wrap: ~typing.Callable = <function <lambda>>) HereditaryStratigraphicAssemblage
Create a postprocessing representation of the differentia retained by a collection of HereditaryStratigraphicColumns.
Parameters
- columnsiterable of HereditaryStratigraphicColumn
Data to serialize.
- progress_wrapCallable, optional
Wrapper applied around generation iterator and row generator for final phylogeny compilation process.
Pass tqdm or equivalent to display progress bars.
- pop_to_dataframe(columns: ~typing.Iterable[~hstrat.genome_instrumentation.HereditaryStratigraphicColumn], progress_wrap: ~typing.Callable = <function <lambda>>) DataFrame
Create a pandas DataFrame summarizing several columns, with retained strata as rows.
Parameters
- columnsiterable of HereditaryStratigraphicColumn
Data to serialize.
- progress_wrapCallable, optional
Wrapper applied around generation iterator and row generator for final phylogeny compilation process.
Pass tqdm or equivalent to display progress bars.
- pop_to_records(columns: ~typing.Iterable[~hstrat.genome_instrumentation.HereditaryStratigraphicColumn], progress_wrap: ~typing.Callable = <function <lambda>>) Dict
Serialize a sequence of `HereditaryStratigraphicColumn`s to a dict composed of builtin types.
Parameters
- columnsiterable of HereditaryStratigraphicColumn
Data to serialize.
- progress_wrapCallable, default identity function
Wrapper applied around generation iterator and row generator for final phylogeny compilation process.
Pass tqdm or equivalent to display progress bars.
- specimen_from_records(records: Dict) HereditaryStratigraphicSpecimen
Deserialize a HereditaryStratigraphicSpecimen from a dict composed of builtin data types.
See Also
- HereditaryStratigraphicSpecimen
Postprocessing representation of the differentia retained by an extant HereditaryStratigraphicColumn, indexed by deposition rank.
- surf_from_hex(hex_string: str, dstream_algo: ModuleType, *, dstream_S: int, dstream_storage_bitoffset: int | None = None, dstream_storage_bitwidth: int, dstream_T_bitoffset: int = 0, dstream_T_bitwidth: int = 32) HereditaryStratigraphicSurface
Deserialize a HereditaryStratigraphicSurface object from a hex string representation.
Hex string representation needs exactly two contiguous parts: 1. dstream_T (which is the number of depositions elapsed), and 2. dstream_storage (which holds all the stored differentiae).
Data in hex string representation should use big-endian byte order.
Parameters
- hex_string: str
Hex string to be parsed, which can be uppercase or lowercase.
- dstream_algo: module
Dstream algorithm for curation of retained differentia.
- dstream_storage_bitoffset: int, default dstream_T_bitwidth
Number of bits before the storage.
- dstream_storage_bitwidth: int
Number of bits used for storage.
- dstream_T_bitoffset: int, default 0
Number of bits before dstream_T.
- dstream_T_bitwidth: int, default 32
Number of bits used to store dstream_T.
- dstream_S: int
Number of buffer sites available to store differentiae.
Determines how many differentiae are unpacked from storage.
See Also
- surf_to_hex()
Serialize a surface into a hex string.
- surf_to_hex(surface: HereditaryStratigraphicSurface, *, dstream_T_bitwidth: int = 32) str
Serialize a HereditaryStratigraphicSurface object into a hex string representation.
- Serialized data comprises two components:
dstream_T (the number of depositions elapsed) and
dstream_storage (binary data of differentia values).
The hex layout used is:
- 0x…
########************************************************** ^ ^
- dstream_T, length = dstream_T_bitwidth / 4 |
dstream_storage, length = item_bitwidth / 4 * dstream_S
This hex string can be reconstituted into a HereditaryStratigraphicSurface object by calling HereditaryStratigraphicSurface.from_hex() with the following parameters:
dstream_T_bitoffset = 0
dstream_T_bitwidth = dstream_T_bitwidth
dstream_storage_bitoffset = dstream_T_bitwidth
dstream_storage_bitwidth = self.S * item_bitwidth
Parameters
- item_bitwidth: int
Number of storage bits used per differentia.
- dstream_T_bitwidth: int, default 32
Number of bits used to store count of elapsed depositions.
See Also
- surf_from_hex()
Deserialize a surface from a hex string.
- surf_to_specimen(surface: HereditaryStratigraphicSurface) HereditaryStratigraphicSpecimen
Convert a HereditaryStratigraphicSurface to a HereditaryStratigraphicSpecimen.
Parameters
- surfaceHereditaryStratigraphicSurface
The surface to convert.
Returns
- HereditaryStratigraphicSpecimen
Specimen with differentia indexed by retained ranks.
See Also
- col_to_specimen :
Convert a HereditaryStratigraphicColumn to a specimen.
- unassemblage_from_records(records: ~typing.Dict, progress_wrap: ~typing.Callable = <function <lambda>>, mutate: bool = False) List[HereditaryStratigraphicSpecimen]
Deserialize a population of HereditaryStratigraphicColumn`s into a list of `HereditaryStratigraphiSpecimen from a dict composed of builtin types.
Parameters
- recordsdict
Data to deserialize.
- progress_wrapCallable, optional
Wrapper applied around generation iterator and row generator for final phylogeny compilation process.
Pass tqdm or equivalent to display progress bars.
- mutatebool, default False
Are side effects on the input argument records allowed?
See Also
- HereditaryStratigraphicSpecimen
Postprocessing representation of the differentia retained by an extant HereditaryStratigraphicColumn, indexed by deposition rank.
- HereditaryStratigraphicAssemblage
A collection of HereditaryStratigraphicSpecimens, padded to include entries for all ranks retained by any specimen within the assemblage.
- unpack_differentiae(packed_differentiae: str, differentia_bit_width: int) Iterator[int]
Unpack a compact, concatenated base 64 representation into a sequence with each element represented as a distinct integer.
Deprecated since version 1.8.0: Use unpack_differentiae_str instead.
- unpack_differentiae_bytes(packed_differentiae: Buffer, differentia_bit_width: int, differentiae_byte_bit_order: Literal['big', 'little'] = 'big', num_packed_differentia: int | None = None) Iterator[int]
Unpack a compact, concatenated byte buffer representation into a sequence with each element represented as a distinct integer.
- unpack_differentiae_str(packed_differentiae: str, differentia_bit_width: int, differentiae_byte_bit_order: Literal['big', 'little'] = 'big', num_packed_differentia: int | None = None) Iterator[int]
Unpack a compact, concatenated base 64 representation into a sequence with each element represented as a distinct integer.
Notes
Specifying num_packed_differentia implies that packed_differentiae has no header specifying num padding bits.