Open In Colab

Demo: Real-time Ping Derby

This demonstration showcases end-to-end application of hstrat to track phylogenetic history of a real-time evolutionary process using the hstrat library. This simulation dispatches genomes out over-the-wire on round trips to various web servers via ping requests. This network transport mechanism is intended to demonstrate conditions analogous to distributed (i.e., many-CPU) evolutionary simulations, where trivial centralized ancestry tracking is not possible.

Digital “genomes” in this simulation comprise a four character domain name (e.g., abcd for abcd.com) that probabilistically mutate from one generation to the next.

Each generation, all genomes in the population are serialized and then dispatched as payloads within ping requests to their encoded domain. The first n ping responses received back are deserialized and used to create the next generation. In this way, the availaibilty of genomes’ encoded domain to expediently return ping responses influences their selection for reproduction.

The functional content (i.e., domain name) of genomes is supplemented with hstrat instrumentation to allow for phylogenetic reconstruction of simulation history. This instrumentation is also sent over the wire, bundled with the functional genome content.

The key features of hstrat demonstrated include: - Use of hstrat’s stratum retention policies to balance accuracy of phylogenetic reconstruction against the constraint of payload size. - Serialization and deserialization of genomes (functional content plus instrumentation) for transmission in ping payloads. - Reconstruction of evolutionary history from the final population’s hstrat instrumentation, allowing visualization of phylogenetic history to understand evolutionary dynamics over time.

The simulation proceeds by initializing a population from a common ancestor, subjecting this population to mutations and selections across generations, and finally analyzing the resulting phylogenetic tree to infer the evolutionary history.

Set Up: Environment, Dependencies, and Parameters

In addition to hstrat, we’ll use some tools from mmore500/alife-phylogeny-tutorial to perform ping operations.

[ ]:
# environment...
!find . -name . -o -prune -exec rm -rf -- {} +
!git init
!git remote add origin https://github.com/mmore500/alife-phylogeny-tutorial.git
!git fetch origin
!git checkout 007340472e021588636b50157c7d3045269e707f
!python3 -m pip install -r requirements.txt
[2]:
# dependencies...
import random
import string
import typing

import alifedata_phyloinformatics_convert as apc
from hstrat import hstrat; print(f"Demo uses hstrat v{hstrat.__version__}")
import pandas as pd
import typing_extensions

import pylib  # local Python library @ ./pylib/
Demo uses hstrat v1.8.2
[3]:
# parameters...
# how many characters can genomes' domain string be?
TARGET_DOMAIN_LEN: int = 4
CHAR_MUTATE_RATE: float = 0.1
N_POP: int = 8
N_GEN: int = 10

# how many copies can each genome make of itself
# per reproduction event
# i.e., how many outgoing pings to send at once
PING_COPY_COUNT: int = 2

# use 1 byte differentia values for hstrat instrumentation
DIFFERENTIA_BIT_WIDTH: int = 8

# use 4 bytes to store generation counter when serializing hstrat instrumentation
GEN_COUNTER_BYTE_WIDTH: int = 4

Choose Retention Policy

Configure stratum retention policy to manage trade-off between instrumentation size (i.e., bytes occupied) vs. reconstruction accuracy.

We don’t want our ping payloads to get too large, so we will pick a retention policy that keeps instrumentation size below a fixed size cap. Let’s use the curbed recency-proportional resolution stratum retention algorithm, which is a good go-to choice for space-constrained evolutionary applications of hstrat.

Setting up a retention policy based on this algorithm requires calculation of the maximum number of differentia we would like to retain.

For this example, suppose a 32 byte size budget for ping payload. Allocate 4 bytes for target domain string (functional genome content).

Allocating another 4 bytes for the generation counter component of hstrat instrumentation will allow us up to 4,294,967,295 generations, which is plenty for this use case.

We have 24 bytes left, so at 1 byte per differentia, we can accomodate up to 24 differentia.

[4]:
ping_stratum_retention_policy = (
    hstrat.recency_proportional_resolution_curbed_algo.Policy(
        size_curb=24,  # max num differentia retained at any one time
    )
)

Define Genome

Define a genome object to bundle the usual functional genetic information (i.e., influences phenotype/fitness) with purely-instrumentative hstrat information, which is invisible to fitness evaluation and only used to observe simulation history — just like you might use a generation counter or mutation counter in other circumstances.

The genome object applies mutation to functional genetic information (in this case, the domain to ping against) and the column update process to hstrat instrumentation when replicated to create an offspring.

The to_packet and from_packet methods handle serialization/deserialization so that genome objects can be converted to raw bytes to be sent off in a ping payload and then read back from raw bytes when the ping payload returns.

Note the use of built-in library tools to manage serialization/deserialization of hstrat instrumentation. In addition to binary format used here, the library also includes built-in tools to serialize/deserialize to a variety of plain-text formats.

[5]:
class PingGenome:

    # where to ping this genome against
    target_domain: str

    # instrumentation to facilitate phylogenetic inference
    hstrat_column: hstrat.HereditaryStratigraphicColumn

    def __init__(
        self: "PingGenome",
        target_domain: typing.Optional[str] = None,
        hstrat_column: typing.Optional[hstrat.HereditaryStratigraphicColumn] = None,
    ):
        if target_domain is None:
            # create random target domain
            target_domain = "".join(
                random.choice(string.ascii_lowercase)
                for __ in range(TARGET_DOMAIN_LEN)
            )
        self.target_domain = target_domain

        if hstrat_column is None:
            self.hstrat_column = hstrat.HereditaryStratigraphicColumn(
                # stratum_retention_policy: typing.Any
                # Policy struct that specifies the set of strata ranks
                # that should be pruned from a hereditary
                # stratigraphic column when the nth stratum is deposited.
                stratum_retention_policy=ping_stratum_retention_policy,
                # always_store_rank_in_stratum : bool, optional
                # Should the deposition rank be stored as a data member of generated
                # strata, even if not strictly necessary?
                always_store_rank_in_stratum=False,
                # stratum_differentia_bit_width : int, optional
                # The bit width of the generated differentia. Default 64, allowing
                # for 2^64 distinct values.
                stratum_differentia_bit_width=DIFFERENTIA_BIT_WIDTH,
            )
        else:
            self.hstrat_column = hstrat_column

    def mutate(self: "PingGenome") -> None:
        # for each target_domain character,
        # apply a scramble event with CHAR_MUTATE_RATE probability
        self.target_domain = "".join(
            random.choice(string.ascii_lowercase)
            if random.random() < CHAR_MUTATE_RATE
            else char
            for char in self.target_domain
        )

    def create_offspring(self: "PingGenome") -> "PingGenome":
        offspring = PingGenome(
            target_domain=self.target_domain,  # inherit target_domain
            hstrat_column=(
                # register elapsed generation w/ hstrat instrumentation,
                # then pass instrumentation along to offspring
                self.hstrat_column.CloneDescendant()
            ),
        )
        offspring.mutate()  # mutate target_domain
        return offspring

    def to_packet(self: "PingGenome") -> typing_extensions.Buffer:
        # serialize genome to a binary string
        # that can be transmitted within ping payload
        annotation_packet_bytes = hstrat.col_to_packet(
            self.hstrat_column,
            num_strata_deposited_byte_width=GEN_COUNTER_BYTE_WIDTH,
        )
        return self.target_domain.encode() + annotation_packet_bytes

    @staticmethod
    def from_packet(data: typing_extensions.Buffer) -> "PingGenome":
        # deserialize genome from a binary string
        # i.e., extracted from a ping payload

        # first TARGET_DOMAIN_LEN bytes are target_domain string
        target_domain = data[:TARGET_DOMAIN_LEN].decode()

        # all the rest is the hstrat instrumentation
        hstrat_column = hstrat.col_from_packet_buffer(
            packet_buffer=data[TARGET_DOMAIN_LEN:],
            differentia_bit_width=DIFFERENTIA_BIT_WIDTH,
            num_strata_deposited_byte_width=GEN_COUNTER_BYTE_WIDTH,
            stratum_retention_policy=ping_stratum_retention_policy,
        )

        # put deserialized components together into a genome object
        return PingGenome(
            target_domain=target_domain,
            hstrat_column=hstrat_column,
        )

Define Selection

Process one generation of evolution on a population of PingGenome’s and return “winning” offspring who made it back first as the next population.

[6]:
def elapse_generation(
    population: typing.List[PingGenome],
) -> typing.List[PingGenome]:

    # manages socket resources, etc.
    pinger = pylib.PayloadPinger()

    # loop until we get enough packets back
    # to fill next population to same size as current population
    next_population_packets: typing.List[typing_extensions.Buffer] = []
    while len(next_population_packets) < len(population):

        # how many more packets do we need?
        num_empty_next_population_slots = len(population) - len(
            next_population_packets
        )

        # dispatch ping requests
        for __ in range(num_empty_next_population_slots):

            # selection is random among current population
            selection = random.choice(population)
            # create several offspring and dispatch into ping payloads
            for __ in range(PING_COPY_COUNT):
                # create_offspring makes genome copy, applies mutation,
                # & registers elapsed generation w/ hstrat instrumentaiton
                offspring = selection.create_offspring()

                # figure out where offspring points to
                # and dispatch it as a ping payload
                target_url = offspring.target_domain + ".com"
                pinger.send(target_url, offspring.to_packet())
                # log request event
                print(f"---> packet sent to {target_url}")

        # collect all available ping responses
        # & extact their payloads into next_population_packets
        # until we have enough packets for next population
        while len(next_population_packets) < len(population):

            maybe_packet = pinger.read()
            if maybe_packet is None:
                break  # no more ping responses to read right now
            else:
                next_population_packets.append(maybe_packet)

            # log response event
            packet_domain = maybe_packet[:TARGET_DOMAIN_LEN].decode()
            print(f" <=== packet returned from {packet_domain}")

    # deserialize packets back into genome objects
    next_population: typing.List[PingGenome] = [
        PingGenome.from_packet(packet) for packet in next_population_packets
    ]
    return next_population

Do Evolution

Create a common ancestor, initialize population of N_POP offspring of common ancestor, and update population using selection process N_GEN times.

[7]:
# create a common ancestor
common_ancestor = PingGenome()

# initialize population with offspring of common ancestor
population = [common_ancestor.create_offspring() for __ in range(N_POP)]

# update population N_GEN times
for generation in range(N_GEN):
    print(f"\n------- generation {generation} -------")
    population = elapse_generation(population)

------- generation 0 -------
---> packet sent to sjqa.com
---> packet sent to ooqa.com
---> packet sent to ojhd.com
---> packet sent to ojqd.com
---> packet sent to ojqd.com
---> packet sent to ojqd.com
---> packet sent to ojua.com
---> packet sent to kjqa.com
---> packet sent to ojqd.com
---> packet sent to gjqd.com
---> packet sent to ojqd.com
---> packet sent to ojqd.com
---> packet sent to bjqh.com
---> packet sent to wjqh.com
---> packet sent to ojqa.com
---> packet sent to ojqa.com
 <=== packet returned from ojhd
 <=== packet returned from sjqa
 <=== packet returned from ojqd
 <=== packet returned from ojqd
 <=== packet returned from ojqd
 <=== packet returned from ojua
 <=== packet returned from ojqd
 <=== packet returned from kjqa

------- generation 1 -------
---> packet sent to ojqd.com
---> packet sent to ojqb.com
---> packet sent to ojqi.com
---> packet sent to kjqd.com
---> packet sent to ojuv.com
---> packet sent to ojua.com
---> packet sent to ojqd.com
---> packet sent to ojqd.com
---> packet sent to ojqd.com
---> packet sent to zjkd.com
---> packet sent to ojqd.com
---> packet sent to ojqd.com
---> packet sent to qoua.com
---> packet sent to ojua.com
---> packet sent to tjqd.com
---> packet sent to ojqd.com
 <=== packet returned from ojqd
 <=== packet returned from ojqa
 <=== packet returned from ojqa
 <=== packet returned from ojqi
 <=== packet returned from ojua
 <=== packet returned from ojqd
 <=== packet returned from ojqd
 <=== packet returned from ojqd

------- generation 2 -------
---> packet sent to oeqn.com
---> packet sent to ujqd.com
---> packet sent to ojqd.com
---> packet sent to onqd.com
---> packet sent to ojza.com
---> packet sent to bjua.com
---> packet sent to ojqv.com
---> packet sent to ojqd.com
---> packet sent to ojqd.com
---> packet sent to ojqd.com
---> packet sent to opua.com
---> packet sent to ojua.com
---> packet sent to ojua.com
---> packet sent to ojua.com
---> packet sent to ojpa.com
---> packet sent to ojqa.com
 <=== packet returned from ojqd
 <=== packet returned from oeqn
 <=== packet returned from ojqd
 <=== packet returned from tjqd
 <=== packet returned from onqd
 <=== packet returned from ojza
 <=== packet returned from bjua
 <=== packet returned from ojqd

------- generation 3 -------
---> packet sent to ojqd.com
---> packet sent to ojqd.com
---> packet sent to gjuk.com
---> packet sent to bjua.com
---> packet sent to mjqd.com
---> packet sent to ojqd.com
---> packet sent to ojqd.com
---> packet sent to ojqd.com
---> packet sent to tjqd.com
---> packet sent to tjqd.com
---> packet sent to ojga.com
---> packet sent to ojza.com
---> packet sent to ojqz.com
---> packet sent to ojqd.com
---> packet sent to onqd.com
---> packet sent to onqd.com
 <=== packet returned from ojpa
 <=== packet returned from ojqd
 <=== packet returned from ojqd
 <=== packet returned from ojqa
 <=== packet returned from bjua
 <=== packet returned from gjuk
 <=== packet returned from ojqd
 <=== packet returned from ojqd

------- generation 4 -------
---> packet sent to gjuk.com
---> packet sent to gkuk.com
---> packet sent to ojod.com
---> packet sent to ojqd.com
---> packet sent to gjuk.com
---> packet sent to gjuk.com
---> packet sent to ojqh.com
---> packet sent to ofqd.com
---> packet sent to ojqd.com
---> packet sent to ojqd.com
---> packet sent to ojqd.com
---> packet sent to ojmd.com
---> packet sent to ojqa.com
---> packet sent to ojqa.com
---> packet sent to ojqd.com
---> packet sent to ojqd.com
 <=== packet returned from ojqd
 <=== packet returned from gjuk
 <=== packet returned from gkuk
 <=== packet returned from onqd
 <=== packet returned from onqd
 <=== packet returned from ojod
 <=== packet returned from ojqd
 <=== packet returned from ojqh

------- generation 5 -------
---> packet sent to ojwd.com
---> packet sent to ojod.com
---> packet sent to ojqd.com
---> packet sent to ojpd.com
---> packet sent to ojqd.com
---> packet sent to ojqd.com
---> packet sent to ojqd.com
---> packet sent to mpqw.com
---> packet sent to gkfk.com
---> packet sent to gvdk.com
---> packet sent to gjuk.com
---> packet sent to gjuk.com
---> packet sent to onqd.com
---> packet sent to onqd.com
---> packet sent to oiod.com
---> packet sent to ojod.com
 <=== packet returned from ojmd
 <=== packet returned from ojqd
 <=== packet returned from ojqd
 <=== packet returned from ojqa
 <=== packet returned from ojqa
 <=== packet returned from ojod
 <=== packet returned from ojqd
 <=== packet returned from ojqd

------- generation 6 -------
---> packet sent to ojod.com
---> packet sent to ljod.com
---> packet sent to ojqd.com
---> packet sent to njqd.com
---> packet sent to ojod.com
---> packet sent to ojod.com
---> packet sent to ouqd.com
---> packet sent to opqd.com
---> packet sent to ojod.com
---> packet sent to opod.com
---> packet sent to ojqa.com
---> packet sent to ojqa.com
---> packet sent to ojqd.com
---> packet sent to ojqd.com
---> packet sent to ormd.com
---> packet sent to ojmd.com
 <=== packet returned from ojod
 <=== packet returned from ljod
 <=== packet returned from ojqd
 <=== packet returned from oiod
 <=== packet returned from ojod
 <=== packet returned from ojod
 <=== packet returned from ouqd
 <=== packet returned from ojod

------- generation 7 -------
---> packet sent to ljld.com
---> packet sent to ljod.com
---> packet sent to ouod.com
---> packet sent to cjld.com
---> packet sent to oiod.com
---> packet sent to oiod.com
---> packet sent to oiod.com
---> packet sent to oiod.com
---> packet sent to ojqd.com
---> packet sent to ojnd.com
---> packet sent to ojod.com
---> packet sent to ojod.com
---> packet sent to ojod.com
---> packet sent to ojod.com
---> packet sent to ojod.com
---> packet sent to ojom.com
 <=== packet returned from opod
 <=== packet returned from ormd
 <=== packet returned from ojmd
 <=== packet returned from ljld
 <=== packet returned from ljod
 <=== packet returned from ouod
 <=== packet returned from cjld
 <=== packet returned from ojqd

------- generation 8 -------
---> packet sent to ouod.com
---> packet sent to ouod.com
---> packet sent to ljld.com
---> packet sent to ljld.com
---> packet sent to cjld.com
---> packet sent to cjld.com
---> packet sent to ouod.com
---> packet sent to ouod.com
---> packet sent to ojmd.com
---> packet sent to ojmd.com
---> packet sent to ocod.com
---> packet sent to opeb.com
---> packet sent to ljld.com
---> packet sent to ljld.com
---> packet sent to ouod.com
---> packet sent to onot.com
 <=== packet returned from ljld
 <=== packet returned from ljld
 <=== packet returned from cjld
 <=== packet returned from ouod
 <=== packet returned from cjld
 <=== packet returned from ouod
 <=== packet returned from ojmd
 <=== packet returned from ojmd

------- generation 9 -------
---> packet sent to djld.com
---> packet sent to ljld.com
---> packet sent to ojmd.com
---> packet sent to ojmz.com
---> packet sent to ojid.com
---> packet sent to olmd.com
---> packet sent to ljld.com
---> packet sent to ljlp.com
---> packet sent to ojmd.com
---> packet sent to ojmd.com
---> packet sent to cjld.com
---> packet sent to cjld.com
---> packet sent to ljld.com
---> packet sent to ljld.com
---> packet sent to ouod.com
---> packet sent to ouod.com
 <=== packet returned from djld
 <=== packet returned from ljld
 <=== packet returned from ojmd
 <=== packet returned from ojid
 <=== packet returned from ljld
 <=== packet returned from olmd
 <=== packet returned from ljld
 <=== packet returned from ljld

Extract Annotations and Build Tree

Isolate hstrat columns from population at end of simulation. Use hstrat.build_tree to synthesize phylogeny estimate from population members’ hstrat instrumentation.

[8]:
# hstrat instrumentation from population at end of simulation
extant_annotations = [
    # extract hstrat columns from genomes
    # & freeze dynamic instrumentation as "specimens,"
    # which are optimized for postprocessing analysis
    hstrat.col_to_specimen(genome.hstrat_column)
    for genome in population
]

# estimated_phylogeny is stored in alife data standards format
# https://alife-data-standards.github.io/alife-data-standards/phylogeny.html
estimated_phylogeny: pd.DataFrame = hstrat.build_tree(
    population=extant_annotations,
    taxon_labels=[genome.target_domain for genome in population],
    # the `build_tree` function tracks the current best-known general
    # purpose reconstruction algorithm
    # pin to the current version (e.g., "1.7.2") for long-term stability
    # or pin to hstrat.__version__ to track latest algorithm updates
    version_pin=hstrat.__version__,
)

Visualize Phylogeny

Draw an ascii visualization of reconstructed phylogeny.

[9]:
# translate to dendropy (which provides lots of phylogenetics tools)
# via alifedata phyloinformatics conversion tool
dendropy_tree = apc.alife_dataframe_to_dendropy_tree(
    estimated_phylogeny,
    setup_edge_lengths=True,
)

# draw the reconstruction!
print(dendropy_tree.as_ascii_plot(plot_metric="age", width=50))
                                       /----- ojid
                                  /----+
                        /---------+    \----- olmd
                        |         |
                        |         \---------- ojmd
                        |
+-----------------------+              /----- djld
                        |              |
                        |         /----+----- ljld
                        |         |    |
                        \---------+    \----- ljld
                                  |
                                  |    /----- ljld
                                  \----+
                                       \----- ljld