Demo: Real-time Ping Derby
This demonstration showcases end-to-end application of hstrat
to track phylogenetic history of a real-time evolutionary process using the hstrat
library. This simulation dispatches genomes out over-the-wire on round trips to various web servers via ping requests. This network transport mechanism is intended to demonstrate conditions analogous to distributed (i.e., many-CPU) evolutionary simulations, where trivial centralized ancestry tracking is not possible.
Digital “genomes” in this simulation comprise a four character domain name (e.g., abcd
for abcd.com
) that probabilistically mutate from one generation to the next.
Each generation, all genomes in the population are serialized and then dispatched as payloads within ping requests to their encoded domain. The first n
ping responses received back are deserialized and used to create the next generation. In this way, the availaibilty of genomes’ encoded domain to expediently return ping responses influences their selection for reproduction.
The functional content (i.e., domain name) of genomes is supplemented with hstrat instrumentation to allow for phylogenetic reconstruction of simulation history. This instrumentation is also sent over the wire, bundled with the functional genome content.
The key features of hstrat
demonstrated include: - Use of hstrat’s stratum retention policies to balance accuracy of phylogenetic reconstruction against the constraint of payload size. - Serialization and deserialization of genomes (functional content plus instrumentation) for transmission in ping payloads. - Reconstruction of evolutionary history from the final population’s hstrat instrumentation, allowing visualization of phylogenetic history to understand evolutionary dynamics over time.
The simulation proceeds by initializing a population from a common ancestor, subjecting this population to mutations and selections across generations, and finally analyzing the resulting phylogenetic tree to infer the evolutionary history.
Set Up: Environment, Dependencies, and Parameters
In addition to hstrat, we’ll use some tools from mmore500/alife-phylogeny-tutorial
to perform ping operations.
[ ]:
# environment...
!find . -name . -o -prune -exec rm -rf -- {} +
!git init
!git remote add origin https://github.com/mmore500/alife-phylogeny-tutorial.git
!git fetch origin
!git checkout 007340472e021588636b50157c7d3045269e707f
!python3 -m pip install -r requirements.txt
[2]:
# dependencies...
import random
import string
import typing
import alifedata_phyloinformatics_convert as apc
from hstrat import hstrat; print(f"Demo uses hstrat v{hstrat.__version__}")
import pandas as pd
import typing_extensions
import pylib # local Python library @ ./pylib/
Demo uses hstrat v1.8.2
[3]:
# parameters...
# how many characters can genomes' domain string be?
TARGET_DOMAIN_LEN: int = 4
CHAR_MUTATE_RATE: float = 0.1
N_POP: int = 8
N_GEN: int = 10
# how many copies can each genome make of itself
# per reproduction event
# i.e., how many outgoing pings to send at once
PING_COPY_COUNT: int = 2
# use 1 byte differentia values for hstrat instrumentation
DIFFERENTIA_BIT_WIDTH: int = 8
# use 4 bytes to store generation counter when serializing hstrat instrumentation
GEN_COUNTER_BYTE_WIDTH: int = 4
Choose Retention Policy
Configure stratum retention policy to manage trade-off between instrumentation size (i.e., bytes occupied) vs. reconstruction accuracy.
We don’t want our ping payloads to get too large, so we will pick a retention policy that keeps instrumentation size below a fixed size cap. Let’s use the curbed recency-proportional resolution stratum retention algorithm, which is a good go-to choice for space-constrained evolutionary applications of hstrat.
Setting up a retention policy based on this algorithm requires calculation of the maximum number of differentia we would like to retain.
For this example, suppose a 32 byte size budget for ping payload. Allocate 4 bytes for target domain string (functional genome content).
Allocating another 4 bytes for the generation counter component of hstrat instrumentation will allow us up to 4,294,967,295 generations, which is plenty for this use case.
We have 24 bytes left, so at 1 byte per differentia, we can accomodate up to 24 differentia.
[4]:
ping_stratum_retention_policy = (
hstrat.recency_proportional_resolution_curbed_algo.Policy(
size_curb=24, # max num differentia retained at any one time
)
)
Define Genome
Define a genome object to bundle the usual functional genetic information (i.e., influences phenotype/fitness) with purely-instrumentative hstrat information, which is invisible to fitness evaluation and only used to observe simulation history — just like you might use a generation counter or mutation counter in other circumstances.
The genome object applies mutation to functional genetic information (in this case, the domain to ping against) and the column update process to hstrat instrumentation when replicated to create an offspring.
The to_packet
and from_packet
methods handle serialization/deserialization so that genome objects can be converted to raw bytes to be sent off in a ping payload and then read back from raw bytes when the ping payload returns.
Note the use of built-in library tools to manage serialization/deserialization of hstrat instrumentation. In addition to binary format used here, the library also includes built-in tools to serialize/deserialize to a variety of plain-text formats.
[5]:
class PingGenome:
# where to ping this genome against
target_domain: str
# instrumentation to facilitate phylogenetic inference
hstrat_column: hstrat.HereditaryStratigraphicColumn
def __init__(
self: "PingGenome",
target_domain: typing.Optional[str] = None,
hstrat_column: typing.Optional[hstrat.HereditaryStratigraphicColumn] = None,
):
if target_domain is None:
# create random target domain
target_domain = "".join(
random.choice(string.ascii_lowercase)
for __ in range(TARGET_DOMAIN_LEN)
)
self.target_domain = target_domain
if hstrat_column is None:
self.hstrat_column = hstrat.HereditaryStratigraphicColumn(
# stratum_retention_policy: typing.Any
# Policy struct that specifies the set of strata ranks
# that should be pruned from a hereditary
# stratigraphic column when the nth stratum is deposited.
stratum_retention_policy=ping_stratum_retention_policy,
# always_store_rank_in_stratum : bool, optional
# Should the deposition rank be stored as a data member of generated
# strata, even if not strictly necessary?
always_store_rank_in_stratum=False,
# stratum_differentia_bit_width : int, optional
# The bit width of the generated differentia. Default 64, allowing
# for 2^64 distinct values.
stratum_differentia_bit_width=DIFFERENTIA_BIT_WIDTH,
)
else:
self.hstrat_column = hstrat_column
def mutate(self: "PingGenome") -> None:
# for each target_domain character,
# apply a scramble event with CHAR_MUTATE_RATE probability
self.target_domain = "".join(
random.choice(string.ascii_lowercase)
if random.random() < CHAR_MUTATE_RATE
else char
for char in self.target_domain
)
def create_offspring(self: "PingGenome") -> "PingGenome":
offspring = PingGenome(
target_domain=self.target_domain, # inherit target_domain
hstrat_column=(
# register elapsed generation w/ hstrat instrumentation,
# then pass instrumentation along to offspring
self.hstrat_column.CloneDescendant()
),
)
offspring.mutate() # mutate target_domain
return offspring
def to_packet(self: "PingGenome") -> typing_extensions.Buffer:
# serialize genome to a binary string
# that can be transmitted within ping payload
annotation_packet_bytes = hstrat.col_to_packet(
self.hstrat_column,
num_strata_deposited_byte_width=GEN_COUNTER_BYTE_WIDTH,
)
return self.target_domain.encode() + annotation_packet_bytes
@staticmethod
def from_packet(data: typing_extensions.Buffer) -> "PingGenome":
# deserialize genome from a binary string
# i.e., extracted from a ping payload
# first TARGET_DOMAIN_LEN bytes are target_domain string
target_domain = data[:TARGET_DOMAIN_LEN].decode()
# all the rest is the hstrat instrumentation
hstrat_column = hstrat.col_from_packet_buffer(
packet_buffer=data[TARGET_DOMAIN_LEN:],
differentia_bit_width=DIFFERENTIA_BIT_WIDTH,
num_strata_deposited_byte_width=GEN_COUNTER_BYTE_WIDTH,
stratum_retention_policy=ping_stratum_retention_policy,
)
# put deserialized components together into a genome object
return PingGenome(
target_domain=target_domain,
hstrat_column=hstrat_column,
)
Define Selection
Process one generation of evolution on a population of PingGenome
’s and return “winning” offspring who made it back first as the next population.
[6]:
def elapse_generation(
population: typing.List[PingGenome],
) -> typing.List[PingGenome]:
# manages socket resources, etc.
pinger = pylib.PayloadPinger()
# loop until we get enough packets back
# to fill next population to same size as current population
next_population_packets: typing.List[typing_extensions.Buffer] = []
while len(next_population_packets) < len(population):
# how many more packets do we need?
num_empty_next_population_slots = len(population) - len(
next_population_packets
)
# dispatch ping requests
for __ in range(num_empty_next_population_slots):
# selection is random among current population
selection = random.choice(population)
# create several offspring and dispatch into ping payloads
for __ in range(PING_COPY_COUNT):
# create_offspring makes genome copy, applies mutation,
# & registers elapsed generation w/ hstrat instrumentaiton
offspring = selection.create_offspring()
# figure out where offspring points to
# and dispatch it as a ping payload
target_url = offspring.target_domain + ".com"
pinger.send(target_url, offspring.to_packet())
# log request event
print(f"---> packet sent to {target_url}")
# collect all available ping responses
# & extact their payloads into next_population_packets
# until we have enough packets for next population
while len(next_population_packets) < len(population):
maybe_packet = pinger.read()
if maybe_packet is None:
break # no more ping responses to read right now
else:
next_population_packets.append(maybe_packet)
# log response event
packet_domain = maybe_packet[:TARGET_DOMAIN_LEN].decode()
print(f" <=== packet returned from {packet_domain}")
# deserialize packets back into genome objects
next_population: typing.List[PingGenome] = [
PingGenome.from_packet(packet) for packet in next_population_packets
]
return next_population
Do Evolution
Create a common ancestor, initialize population of N_POP
offspring of common ancestor, and update population using selection process N_GEN
times.
[7]:
# create a common ancestor
common_ancestor = PingGenome()
# initialize population with offspring of common ancestor
population = [common_ancestor.create_offspring() for __ in range(N_POP)]
# update population N_GEN times
for generation in range(N_GEN):
print(f"\n------- generation {generation} -------")
population = elapse_generation(population)
------- generation 0 -------
---> packet sent to sjqa.com
---> packet sent to ooqa.com
---> packet sent to ojhd.com
---> packet sent to ojqd.com
---> packet sent to ojqd.com
---> packet sent to ojqd.com
---> packet sent to ojua.com
---> packet sent to kjqa.com
---> packet sent to ojqd.com
---> packet sent to gjqd.com
---> packet sent to ojqd.com
---> packet sent to ojqd.com
---> packet sent to bjqh.com
---> packet sent to wjqh.com
---> packet sent to ojqa.com
---> packet sent to ojqa.com
<=== packet returned from ojhd
<=== packet returned from sjqa
<=== packet returned from ojqd
<=== packet returned from ojqd
<=== packet returned from ojqd
<=== packet returned from ojua
<=== packet returned from ojqd
<=== packet returned from kjqa
------- generation 1 -------
---> packet sent to ojqd.com
---> packet sent to ojqb.com
---> packet sent to ojqi.com
---> packet sent to kjqd.com
---> packet sent to ojuv.com
---> packet sent to ojua.com
---> packet sent to ojqd.com
---> packet sent to ojqd.com
---> packet sent to ojqd.com
---> packet sent to zjkd.com
---> packet sent to ojqd.com
---> packet sent to ojqd.com
---> packet sent to qoua.com
---> packet sent to ojua.com
---> packet sent to tjqd.com
---> packet sent to ojqd.com
<=== packet returned from ojqd
<=== packet returned from ojqa
<=== packet returned from ojqa
<=== packet returned from ojqi
<=== packet returned from ojua
<=== packet returned from ojqd
<=== packet returned from ojqd
<=== packet returned from ojqd
------- generation 2 -------
---> packet sent to oeqn.com
---> packet sent to ujqd.com
---> packet sent to ojqd.com
---> packet sent to onqd.com
---> packet sent to ojza.com
---> packet sent to bjua.com
---> packet sent to ojqv.com
---> packet sent to ojqd.com
---> packet sent to ojqd.com
---> packet sent to ojqd.com
---> packet sent to opua.com
---> packet sent to ojua.com
---> packet sent to ojua.com
---> packet sent to ojua.com
---> packet sent to ojpa.com
---> packet sent to ojqa.com
<=== packet returned from ojqd
<=== packet returned from oeqn
<=== packet returned from ojqd
<=== packet returned from tjqd
<=== packet returned from onqd
<=== packet returned from ojza
<=== packet returned from bjua
<=== packet returned from ojqd
------- generation 3 -------
---> packet sent to ojqd.com
---> packet sent to ojqd.com
---> packet sent to gjuk.com
---> packet sent to bjua.com
---> packet sent to mjqd.com
---> packet sent to ojqd.com
---> packet sent to ojqd.com
---> packet sent to ojqd.com
---> packet sent to tjqd.com
---> packet sent to tjqd.com
---> packet sent to ojga.com
---> packet sent to ojza.com
---> packet sent to ojqz.com
---> packet sent to ojqd.com
---> packet sent to onqd.com
---> packet sent to onqd.com
<=== packet returned from ojpa
<=== packet returned from ojqd
<=== packet returned from ojqd
<=== packet returned from ojqa
<=== packet returned from bjua
<=== packet returned from gjuk
<=== packet returned from ojqd
<=== packet returned from ojqd
------- generation 4 -------
---> packet sent to gjuk.com
---> packet sent to gkuk.com
---> packet sent to ojod.com
---> packet sent to ojqd.com
---> packet sent to gjuk.com
---> packet sent to gjuk.com
---> packet sent to ojqh.com
---> packet sent to ofqd.com
---> packet sent to ojqd.com
---> packet sent to ojqd.com
---> packet sent to ojqd.com
---> packet sent to ojmd.com
---> packet sent to ojqa.com
---> packet sent to ojqa.com
---> packet sent to ojqd.com
---> packet sent to ojqd.com
<=== packet returned from ojqd
<=== packet returned from gjuk
<=== packet returned from gkuk
<=== packet returned from onqd
<=== packet returned from onqd
<=== packet returned from ojod
<=== packet returned from ojqd
<=== packet returned from ojqh
------- generation 5 -------
---> packet sent to ojwd.com
---> packet sent to ojod.com
---> packet sent to ojqd.com
---> packet sent to ojpd.com
---> packet sent to ojqd.com
---> packet sent to ojqd.com
---> packet sent to ojqd.com
---> packet sent to mpqw.com
---> packet sent to gkfk.com
---> packet sent to gvdk.com
---> packet sent to gjuk.com
---> packet sent to gjuk.com
---> packet sent to onqd.com
---> packet sent to onqd.com
---> packet sent to oiod.com
---> packet sent to ojod.com
<=== packet returned from ojmd
<=== packet returned from ojqd
<=== packet returned from ojqd
<=== packet returned from ojqa
<=== packet returned from ojqa
<=== packet returned from ojod
<=== packet returned from ojqd
<=== packet returned from ojqd
------- generation 6 -------
---> packet sent to ojod.com
---> packet sent to ljod.com
---> packet sent to ojqd.com
---> packet sent to njqd.com
---> packet sent to ojod.com
---> packet sent to ojod.com
---> packet sent to ouqd.com
---> packet sent to opqd.com
---> packet sent to ojod.com
---> packet sent to opod.com
---> packet sent to ojqa.com
---> packet sent to ojqa.com
---> packet sent to ojqd.com
---> packet sent to ojqd.com
---> packet sent to ormd.com
---> packet sent to ojmd.com
<=== packet returned from ojod
<=== packet returned from ljod
<=== packet returned from ojqd
<=== packet returned from oiod
<=== packet returned from ojod
<=== packet returned from ojod
<=== packet returned from ouqd
<=== packet returned from ojod
------- generation 7 -------
---> packet sent to ljld.com
---> packet sent to ljod.com
---> packet sent to ouod.com
---> packet sent to cjld.com
---> packet sent to oiod.com
---> packet sent to oiod.com
---> packet sent to oiod.com
---> packet sent to oiod.com
---> packet sent to ojqd.com
---> packet sent to ojnd.com
---> packet sent to ojod.com
---> packet sent to ojod.com
---> packet sent to ojod.com
---> packet sent to ojod.com
---> packet sent to ojod.com
---> packet sent to ojom.com
<=== packet returned from opod
<=== packet returned from ormd
<=== packet returned from ojmd
<=== packet returned from ljld
<=== packet returned from ljod
<=== packet returned from ouod
<=== packet returned from cjld
<=== packet returned from ojqd
------- generation 8 -------
---> packet sent to ouod.com
---> packet sent to ouod.com
---> packet sent to ljld.com
---> packet sent to ljld.com
---> packet sent to cjld.com
---> packet sent to cjld.com
---> packet sent to ouod.com
---> packet sent to ouod.com
---> packet sent to ojmd.com
---> packet sent to ojmd.com
---> packet sent to ocod.com
---> packet sent to opeb.com
---> packet sent to ljld.com
---> packet sent to ljld.com
---> packet sent to ouod.com
---> packet sent to onot.com
<=== packet returned from ljld
<=== packet returned from ljld
<=== packet returned from cjld
<=== packet returned from ouod
<=== packet returned from cjld
<=== packet returned from ouod
<=== packet returned from ojmd
<=== packet returned from ojmd
------- generation 9 -------
---> packet sent to djld.com
---> packet sent to ljld.com
---> packet sent to ojmd.com
---> packet sent to ojmz.com
---> packet sent to ojid.com
---> packet sent to olmd.com
---> packet sent to ljld.com
---> packet sent to ljlp.com
---> packet sent to ojmd.com
---> packet sent to ojmd.com
---> packet sent to cjld.com
---> packet sent to cjld.com
---> packet sent to ljld.com
---> packet sent to ljld.com
---> packet sent to ouod.com
---> packet sent to ouod.com
<=== packet returned from djld
<=== packet returned from ljld
<=== packet returned from ojmd
<=== packet returned from ojid
<=== packet returned from ljld
<=== packet returned from olmd
<=== packet returned from ljld
<=== packet returned from ljld
Extract Annotations and Build Tree
Isolate hstrat columns from population at end of simulation. Use hstrat.build_tree
to synthesize phylogeny estimate from population members’ hstrat instrumentation.
[8]:
# hstrat instrumentation from population at end of simulation
extant_annotations = [
# extract hstrat columns from genomes
# & freeze dynamic instrumentation as "specimens,"
# which are optimized for postprocessing analysis
hstrat.col_to_specimen(genome.hstrat_column)
for genome in population
]
# estimated_phylogeny is stored in alife data standards format
# https://alife-data-standards.github.io/alife-data-standards/phylogeny.html
estimated_phylogeny: pd.DataFrame = hstrat.build_tree(
population=extant_annotations,
taxon_labels=[genome.target_domain for genome in population],
# the `build_tree` function tracks the current best-known general
# purpose reconstruction algorithm
# pin to the current version (e.g., "1.7.2") for long-term stability
# or pin to hstrat.__version__ to track latest algorithm updates
version_pin=hstrat.__version__,
)
Visualize Phylogeny
Draw an ascii visualization of reconstructed phylogeny.
[9]:
# translate to dendropy (which provides lots of phylogenetics tools)
# via alifedata phyloinformatics conversion tool
dendropy_tree = apc.alife_dataframe_to_dendropy_tree(
estimated_phylogeny,
setup_edge_lengths=True,
)
# draw the reconstruction!
print(dendropy_tree.as_ascii_plot(plot_metric="age", width=50))
/----- ojid
/----+
/---------+ \----- olmd
| |
| \---------- ojmd
|
+-----------------------+ /----- djld
| |
| /----+----- ljld
| | |
\---------+ \----- ljld
|
| /----- ljld
\----+
\----- ljld