Computational Genetic Genealogy

Connecting Individuals into Small Pedigree Structures

Lab 13: Connecting Individuals into Small Pedigree Structures

Core Component: This lab explores how Bonsai v3 connects individuals into small pedigree structures using genetic evidence. This process is foundational to genealogical reconstruction, as it creates the basic building blocks that can be assembled into larger family networks.

The Building Blocks of Pedigree Construction

Connecting Individuals Based on Genetic Evidence

In genetic genealogy, the process of connecting individuals into small pedigree structures requires:

  1. Identifying Relationship Types: Determining the most likely relationship between each pair of individuals
  2. Finding Optimal Connection Points: Determining where and how to connect individuals in a pedigree
  3. Building Coherent Structures: Ensuring the resulting small pedigree is biologically valid and consistent with genetic evidence

Bonsai v3 implements this process through a sophisticated pipeline in the connections.py module. The core of this functionality revolves around identifying and leveraging connection points:

def connect_pedigrees_through_points(
    id1 : int,
    id2 : int,
    pid1 : Optional[int],
    pid2 : Optional[int],
    up_dct1 : dict[int, dict[int, int]],
    up_dct2 : dict[int, dict[int, int]],
    deg1 : int,
    deg2 : int,
    num_ancs : int,
    simple : bool=True,
):
    """
    Connect up_dct1 to up_dct2 through points id1 in up_dct1
    and id2 in up_dct2. Also connect through partner points
    pid1 and pid2, if indicated. Connect id1 to id2 through
    a relationship specified by (deg1, deg2, num_ancs).
    """
    # can't connect "on" genotyped nodes
    if deg1 == deg2 == 0 and (id1 > 0 and id2 > 0) and id1 != id2:
        return []

    # can't connect "on" genotyped or non-existent partner nodes
    if deg1 == deg2 == 0 and (pid1 != pid2):
        if pid1 is None or pid1 is None:
            return []
        elif pid1 > 0 and pid2 > 0:
            return []

    up_dct1 = copy.deepcopy(up_dct1)
    up_dct2 = copy.deepcopy(up_dct2)

    if deg1 > 0:
        up_dct1, _, id1, pid1 = extend_up(
            iid=id1,
            deg=deg1,
            num_ancs=num_ancs,
            up_dct=up_dct1,
        )

    if deg2 > 0:
        up_dct2, _, id2, pid2 = extend_up(
            iid=id2,
            deg=deg2,
            num_ancs=num_ancs,
            up_dct=up_dct2,
        )

    # shift IDs so that they don't overlap
    min_id = get_min_id(up_dct1)-1
    up_dct2, id_map = shift_ids(
        ped=up_dct2,
        shift=min_id,
    )
    id2 = id_map.get(id2, id2)
    pid2 = id_map.get(pid2, pid2)

    # get a mapping of IDs in up_dct1 to match
    # with IDs in up_dct2
    if simple:
        if (pid1 is not None) and (pid2 is not None):
            id_map_list = [
                {id1 : id2, pid1 : pid2}
            ]
        else:
            id_map_list = [
                {id1 : id2}
            ]
    else:
        id_map_list = get_all_matches(
            id1=id1,
            id2=id2,
            pid1=pid1,
            pid2=pid2,
            up_dct1=up_dct1,
            up_dct2=up_dct2,
        )

    # connect up_dct1 to up_dct2 in all
    # ways specified in id_map_list
    connect_dct_list = []
    for id_map in id_map_list:
        up_dct = connect_on(
            id_map=id_map,
            up_dct1=up_dct1,
            up_dct2=up_dct2,
        )
        connect_dct_list.append(up_dct)

    return connect_dct_list

This function handles the intricate process of connecting two pedigrees through specified points, taking into account:

  • Compatibility Checks: Verifying that the proposed connection is biologically valid
  • Extending Lineages: Building out the necessary ancestor nodes using extend_up
  • ID Management: Ensuring no conflicts between node IDs when combining pedigrees
  • Multiple Connection Options: Returning alternative ways to connect the pedigrees when multiple options exist

This sophisticated connection mechanism allows Bonsai v3 to construct small pedigree structures that accurately reflect genetic relationships while maintaining biological plausibility.

Building Basic Family Units

The foundation of pedigree construction is building basic family units - parents with children, siblings sharing parents, etc. In Bonsai v3, these operations are implemented in the pedigrees.py module. A key function for this purpose is add_parent:

def add_parent(
    node: int,
    up_dct: dict[int, dict[int, int]],
    min_id: Optional[int]=None,
):
    """
    Add an ungenotyped parent to node in up_dct.
    """
    if node not in up_dct:
        raise BonsaiError(f"Node {node} is not in up dct.")

    pid_dict = up_dct[node]
    if len(pid_dict) >= 2:
        return up_dct, None

    if min_id is None:
        min_id = get_min_id(up_dct)

    new_pid = min_id - 1
    up_dct[node][new_pid] = 1

    return up_dct, new_pid

This function adds an ungenotyped parent (i.e., a parent for whom we don't have genetic data) to a specified node in the pedigree. It's a fundamental building block for constructing family units when we have only partial data. The function:

  1. Verifies that the node exists in the pedigree
  2. Checks that the node doesn't already have two parents (biological maximum)
  3. Finds an appropriate ID for the new parent (using negative IDs for ungenotyped individuals)
  4. Adds the parent-child relationship to the pedigree

Other key functions for building family units include:

def get_partner_id_set(
    node: int,
    up_dct: dict[int, dict[int, int]],
):
    """
    Find the set of partners of node in pedigree up_dct.
    
    Partners are defined as other parents of node's children.
    """
    down_dct = reverse_node_dict(up_dct)  # Get children pointing to parents
    
    # Get all children with a parent-child relationship (degree=1)
    child_id_set = {c for c, d in down_dct.get(node, {}).items() if d == 1}
    
    # For each child, get their other parents
    partner_id_set = set()
    for cid in child_id_set:
        # Get all parents of this child
        pids = {p for p, d in up_dct.get(cid, {}).items() if d == 1}
        # Add to partner set
        partner_id_set |= pids
    
    # Remove the node itself from the partner set
    partner_id_set -= {node}
    
    return partner_id_set

This function identifies partner relationships by finding other individuals who are parents of the same children. This is crucial for constructing family units because:

  • It identifies existing family structures without requiring explicit relationship encoding
  • It allows Bonsai to connect additional relatives through appropriate family units
  • It helps identify potential connection points for merging pedigrees

Together, these building block functions enable Bonsai v3 to construct basic family units that form the foundation of larger pedigree structures.

Extending Lineages

Once basic family units are established, Bonsai v3 can extend these into lineages by adding ancestors or descendants. A key function for lineage extension is extend_up:

def extend_up(
    iid: int,
    deg: int,
    num_ancs: int,
    up_dct: dict[int, dict[int, int]],
):
    """
    Extend a lineage up from iid in up node dict up_dct.
    
    Args:
        iid: ID of individual to extend from
        deg: Number of generations to extend up
        num_ancs: Number of ancestors to add (1 or 2)
        up_dct: Up-node dictionary representing the pedigree
        
    Returns:
        up_dct: Updated pedigree with extended lineage
        node_id: ID of node from which extension began
        new_id: ID of most recent ancestor added
        part_id: ID of partner ancestor (if num_ancs=2)
    """
    if deg == 0:
        return up_dct, None, iid, None
        
    # Get minimum ID for creating new ancestors
    min_id = get_min_id(up_dct)
    new_id = min(min_id - 1, -1)  # Ensure negative ID for ungenotyped
    
    # Initialize variables
    prev_id = None
    part_id = None
    curr_id = iid
    
    # Extend lineage upward deg generations
    while deg > 0:
        # Ensure current ID exists in pedigree
        if curr_id not in up_dct:
            up_dct[curr_id] = {}
            
        # Check if can add more parents
        if len(up_dct[curr_id]) >= 2:
            raise ValueError(f"Cannot add parent to {curr_id}, already has 2 parents")
            
        # Add new ancestor as parent
        up_dct[curr_id][new_id] = 1
        if new_id not in up_dct:
            up_dct[new_id] = {}
            
        # Add partner if this is final generation and num_ancs=2
        if deg == 1 and num_ancs == 2:
            part_id = new_id - 1
            up_dct[curr_id][part_id] = 1
            if part_id not in up_dct:
                up_dct[part_id] = {}
                
        # Move up one generation
        prev_id = curr_id
        curr_id = new_id
        new_id -= 1
        deg -= 1
        
    return up_dct, prev_id, curr_id, part_id

This function extends a lineage upward by adding ungenotyped ancestors, with the ability to add either one or two ancestors at the final generation. It's essential for building connections between individuals who are separated by multiple generations. The function:

  1. Creates a chain of ancestors for the specified number of generations
  2. Adds either a single ancestor or a pair of ancestors (e.g., both parents) at the final generation
  3. Manages IDs to ensure uniqueness and proper representation of ungenotyped individuals
  4. Returns information about the added ancestors for further pedigree construction

Through strategic use of extend_up, Bonsai v3 can build pedigree structures that span multiple generations, even when genetic data is only available for some of the individuals.

Pedigree Connection Mechanisms

Finding Connection Points

A critical step in building small pedigree structures is identifying where two individuals or pedigrees can be connected. Bonsai v3 implements this through the get_possible_connection_point_set function:

def get_possible_connection_point_set(
    ped: dict[int, dict[int, int]],
) -> set[tuple[int, Optional[int], Optional[int]]]:
    """
    Find all possible points through which a pedigree (ped) can be connected
    to another pedigree. A point is a tuple of the form (id1, id2, dir),
    where id1 is the main individual through whom the pedigree is connected
    and id2 is a possible secondary connecting individual (always a partner of id1
    if they exist). id2 can be None. dir indicates whether the pedigree is
    connected up to the other pedigree or down to the other pedigree. 0=down
    1=up.
    """
    point_set = set()
    all_ids = get_all_id_set(ped)
    for a in all_ids:
        parent_to_deg = ped.get(a, {})
        if len(parent_to_deg) < 2:
            point_set.add((a, None, 1))

        partners = get_partner_id_set(a, ped)
        point_set.add((a, None, 0))
        for partner in partners:
            if (partner, a, 0) not in point_set:  # only need one orientation
                point_set.add((a, partner, 0))
            point_set.add((a, partner, None))  # try reverse orientation

        point_set.add((a, None, None))

    return point_set

This function identifies all potential connection points in a pedigree, returning a set of tuples that specify:

  1. Primary ID (id1): The main individual through whom the connection is made
  2. Secondary ID (id2): An optional partner of the primary individual (can be None)
  3. Direction (dir): Whether the connection is upward (1), downward (0), or lateral (None)

These connection points represent all the possible ways that a pedigree can be extended or connected to another pedigree. By identifying these points, Bonsai v3 can systematically evaluate different connection options to find the one that best explains the genetic data.

The function considers several types of connection points:

  • Upward Connections: Individuals who can have additional parents added
  • Downward Connections: Individuals who can have children added
  • Partner Connections: Pairs of individuals who are partners and can have common children
  • Lateral Connections: Points where individuals can be directly replaced or connected laterally

This comprehensive approach ensures that Bonsai v3 considers all biologically plausible ways to connect pedigrees, maximizing the chance of finding the correct structure.

Connecting and Merging Pedigrees

Once connection points are identified, Bonsai v3 can connect and merge pedigrees to form larger structures. The combine_pedigrees function orchestrates this process:

def combine_pedigrees(
    up_dct1: dict[int, dict[int, int]],
    up_dct2: dict[int, dict[int, int]],
    id_to_shared_ibd: dict[tuple[int, int], list[dict]],
    id_to_info: dict[int, dict],
    pw_ll: Any,
    max_up: int = 3,
    keep_num: int = 3,
    return_many: bool = False,
):
    """
    Combine two pedigrees into one, using IBD sharing to guide the connection.
    
    Args:
        up_dct1, up_dct2: The pedigrees to combine
        id_to_shared_ibd: Dict mapping ID pairs to IBD segments
        id_to_info: Dict mapping IDs to biographical information
        pw_ll: PwLogLike instance for likelihood calculation
        max_up: Maximum number of generations to extend upward
        keep_num: Number of top combinations to keep
        return_many: Whether to return multiple possible combinations
        
    Returns:
        Combined pedigree or list of top combinations with likelihoods
    """
    # Find connections between the pedigrees based on IBD
    con_pairs = find_connecting_pairs(
        up_dct1=up_dct1,
        up_dct2=up_dct2,
        id_to_shared_ibd=id_to_shared_ibd,
        min_cm=20,
    )
    
    if not con_pairs:
        return None if not return_many else []
    
    # Get all possible connection points in each pedigree
    con_pts1 = get_possible_connection_point_set(up_dct1)
    con_pts2 = get_possible_connection_point_set(up_dct2)
    
    # Generate and evaluate all possible combinations
    top_combs = []
    for (id1, id2) in con_pairs:
        # Find connection points involving id1 and id2
        rel_con_pts1 = [pt for pt in con_pts1 if pt[0] == id1]
        rel_con_pts2 = [pt for pt in con_pts2 if pt[0] == id2]
        
        # For each pair of connection points
        for cp1 in rel_con_pts1:
            for cp2 in rel_con_pts2:
                # Try different relationship configurations
                for up in range(max_up + 1):
                    for down in range(max_up + 1):
                        if up + down > max_up:
                            continue
                        
                        # Try both 1 and 2 ancestors
                        for num_ancs in [1, 2]:
                            # Try connecting the pedigrees
                            combs = connect_pedigrees_through_points(
                                id1=cp1[0], 
                                id2=cp2[0],
                                pid1=cp1[1], 
                                pid2=cp2[1],
                                up_dct1=up_dct1, 
                                up_dct2=up_dct2,
                                deg1=up, 
                                deg2=down,
                                num_ancs=num_ancs,
                            )
                            
                            # Evaluate each combination
                            for comb in combs:
                                ll = evaluate_pedigree(
                                    ped=comb,
                                    id_to_shared_ibd=id_to_shared_ibd,
                                    id_to_info=id_to_info,
                                    pw_ll=pw_ll,
                                )
                                
                                # Add to list of top combinations
                                top_combs.append((comb, ll))
    
    # Sort by likelihood and keep top combinations
    top_combs.sort(key=lambda x: x[1], reverse=True)
    top_combs = top_combs[:keep_num]
    
    # Return results based on return_many parameter
    if return_many:
        return top_combs
    else:
        return top_combs[0][0] if top_combs else None

This function systematically:

  1. Identifies pairs of individuals that connect the two pedigrees based on IBD sharing
  2. Finds all possible connection points in each pedigree
  3. Generates potential combinations by trying different relationships between connection points
  4. Evaluates each combination based on how well it explains the observed genetic data
  5. Returns either the best combination or a list of top combinations

The function explores a comprehensive space of possible connections, considering different relationship types (varying up, down, and num_ancs parameters) and different connection points. This allows Bonsai v3 to find the most likely way to combine two pedigrees based on genetic evidence.

The likelihood-based approach is crucial because it allows Bonsai v3 to:

  • Handle ambiguity when multiple connection options are plausible
  • Incorporate both genetic and non-genetic information (through id_to_info)
  • Balance complexity and explanatory power in the resulting pedigree
  • Maintain multiple hypotheses when certainty is low (return_many=True)

This sophisticated mechanism for pedigree combination is at the heart of Bonsai v3's ability to reconstruct complex family structures from genetic data.

Filling in Missing Individuals

A critical aspect of small pedigree construction is filling in missing individuals to create biologically plausible structures. Bonsai v3 implements this through functions like fill_in_partners:

def fill_in_partners(
    up_dct: dict[int, dict[int, int]],
    fill_max: int = 2,
):
    """
    Fill in all missing partners in a pedigree.
    
    Args:
        up_dct: Up-node dictionary representing the pedigree
        fill_max: Maximum number of missing partners to fill in
        
    Returns:
        up_dct: Updated pedigree with partners filled in
    """
    up_dct = copy.deepcopy(up_dct)
    
    # Find individuals with exactly one parent
    for node, parents in up_dct.items():
        if len(parents) == 1:
            parent = list(parents.keys())[0]
            
            # Check if this parent already has a partner who is parent to this node
            partners = get_partner_id_set(parent, up_dct)
            needs_partner = True
            
            for partner in partners:
                if partner in parents:
                    needs_partner = False
                    break
            
            # If no existing partner is parent to this node, add a new one
            if needs_partner:
                min_id = get_min_id(up_dct)
                new_partner = min_id - 1
                
                # Add the new partner as a parent of the node
                up_dct[node][new_partner] = 1
                
                # Add the new partner to the pedigree
                if new_partner not in up_dct:
                    up_dct[new_partner] = {}
    
    return up_dct

This function identifies individuals who have only one parent recorded and adds a second parent to create biologically complete family units. This is important because:

  • It ensures pedigrees respect biological reality (everyone has two biological parents)
  • It creates more complete structures for relationship inference and connection
  • It provides placeholders that can be identified with real individuals as more data becomes available

Other functions for filling in missing individuals include fill_in_lineages (which creates complete ancestral lineages) and fill_in_siblings (which adds ungenotyped siblings to create more complete family units).

These functions help Bonsai v3 build small pedigree structures that are both genetically consistent and biologically complete, even when the genetic data is incomplete.

Evaluating Pedigree Structures

The Likelihood Framework

A central aspect of Bonsai v3's pedigree construction is evaluating how well different structures explain the observed genetic data. This is implemented through a likelihood framework in the likelihoods.py module:

def get_ped_like(
    up_dct: dict[int, dict[int, int]],
    id_to_shared_ibd: dict[tuple[int, int], list[dict]],
    id_to_info: dict[int, dict],
    pw_ll: Any,
):
    """
    Calculate the likelihood of a pedigree given IBD data.
    
    Args:
        up_dct: Up-node dictionary representing the pedigree
        id_to_shared_ibd: Dict mapping ID pairs to IBD segments
        id_to_info: Dict mapping IDs to biographical information
        pw_ll: PwLogLike instance for likelihood calculation
        
    Returns:
        log_like: Log-likelihood of the pedigree
    """
    log_like = 0.0
    
    # Get all genotyped individuals in the pedigree
    all_ids = get_all_id_set(up_dct)
    gen_ids = [i for i in all_ids if i > 0]  # Genotyped IDs are positive
    
    # For each pair of genotyped individuals
    for i in range(len(gen_ids)):
        for j in range(i+1, len(gen_ids)):
            id1, id2 = gen_ids[i], gen_ids[j]
            
            # Skip if no IBD data for this pair
            pair = (min(id1, id2), max(id1, id2))
            if pair not in id_to_shared_ibd:
                continue
                
            # Get IBD data for this pair
            ibd_segs = id_to_shared_ibd[pair]
            
            # Get relationship from pedigree
            rel_tuple = get_simple_rel_tuple(up_dct, id1, id2)
            
            # Calculate likelihood of this relationship given IBD
            pair_ll = pw_ll.get_ibd_log_like(
                id1=id1,
                id2=id2,
                rel_tuple=rel_tuple,
                ibd_segs=ibd_segs,
            )
            
            # Add to total log-likelihood
            log_like += pair_ll
    
    # Add penalty for improbable age relationships
    age_ll = get_age_log_like(up_dct, id_to_info)
    log_like += age_ll
    
    return log_like

This function calculates the overall log-likelihood of a pedigree by:

  1. Identifying all pairs of genotyped individuals in the pedigree
  2. Determining the relationship between each pair based on the pedigree structure
  3. Calculating how well that relationship explains the observed IBD segments
  4. Adding penalties for biologically implausible age relationships
  5. Combining these components into a total log-likelihood score

The likelihood framework allows Bonsai v3 to quantitatively compare different pedigree structures, selecting the one that best explains the observed genetic data while respecting biological constraints.

This approach has several key advantages:

  • It provides a principled basis for choosing between alternative pedigree structures
  • It naturally incorporates uncertainty in the genetic and biographical data
  • It balances genetic evidence against biological plausibility
  • It can be extended to include additional types of evidence as they become available

By using this likelihood framework, Bonsai v3 can build small pedigree structures that optimally explain the available data, serving as reliable building blocks for larger pedigree reconstructions.

Comparing Alternative Structures

When building small pedigree structures, there are often multiple plausible configurations that could explain the observed genetic data. Bonsai v3 systematically compares these alternatives through functions like evaluate_pedigree_set:

def evaluate_pedigree_set(
    pedigree_set: list[dict[int, dict[int, int]]],
    id_to_shared_ibd: dict[tuple[int, int], list[dict]],
    id_to_info: dict[int, dict],
    pw_ll: Any,
):
    """
    Evaluate a set of alternative pedigree structures.
    
    Args:
        pedigree_set: List of pedigrees to evaluate
        id_to_shared_ibd: Dict mapping ID pairs to IBD segments
        id_to_info: Dict mapping IDs to biographical information
        pw_ll: PwLogLike instance for likelihood calculation
        
    Returns:
        evaluated_pedigrees: List of (pedigree, log_like) tuples, sorted by likelihood
    """
    evaluated_pedigrees = []
    
    # Evaluate each pedigree
    for ped in pedigree_set:
        # Calculate log-likelihood
        log_like = get_ped_like(
            up_dct=ped,
            id_to_shared_ibd=id_to_shared_ibd,
            id_to_info=id_to_info,
            pw_ll=pw_ll,
        )
        
        # Add to list of evaluated pedigrees
        evaluated_pedigrees.append((ped, log_like))
    
    # Sort by likelihood (highest first)
    evaluated_pedigrees.sort(key=lambda x: x[1], reverse=True)
    
    return evaluated_pedigrees

This function evaluates a set of alternative pedigree structures and returns them sorted by likelihood. This is crucial for Bonsai v3's approach to pedigree construction because:

  • It allows exploration of multiple hypotheses about how individuals might be related
  • It provides a principled basis for selecting the most likely structure
  • It quantifies the confidence in different possible structures
  • It allows for maintaining alternative hypotheses when the evidence is ambiguous

By systematically comparing alternative structures, Bonsai v3 can identify the one that best explains the genetic data, while also maintaining awareness of other plausible configurations. This approach is essential for robust pedigree reconstruction, as it prevents premature commitment to potentially incorrect structures.

When constructing complex pedigrees, Bonsai v3 often maintains multiple hypotheses throughout the process, only committing to specific structures when the evidence strongly supports them. This balanced approach helps navigate the inherent uncertainty in genetic genealogy, producing results that accurately reflect the confidence supported by the available data.

Optimizing Small Structures

Beyond evaluating fixed pedigree structures, Bonsai v3 can optimize small structures to better explain the genetic data. This is implemented through functions like optimize_pedigree:

def optimize_pedigree(
    up_dct: dict[int, dict[int, int]],
    id_to_shared_ibd: dict[tuple[int, int], list[dict]],
    id_to_info: dict[int, dict],
    pw_ll: Any,
    max_iterations: int = 10,
):
    """
    Optimize a small pedigree structure to better explain the genetic data.
    
    Args:
        up_dct: Initial pedigree to optimize
        id_to_shared_ibd: Dict mapping ID pairs to IBD segments
        id_to_info: Dict mapping IDs to biographical information
        pw_ll: PwLogLike instance for likelihood calculation
        max_iterations: Maximum number of optimization iterations
        
    Returns:
        optimized_ped: Optimized pedigree structure
        final_ll: Log-likelihood of the optimized pedigree
    """
    current_ped = copy.deepcopy(up_dct)
    current_ll = get_ped_like(current_ped, id_to_shared_ibd, id_to_info, pw_ll)
    
    # Optimization iterations
    for _ in range(max_iterations):
        # Generate variations of the current pedigree
        variations = generate_pedigree_variations(current_ped)
        
        # Evaluate all variations
        best_var = current_ped
        best_ll = current_ll
        
        for var in variations:
            var_ll = get_ped_like(var, id_to_shared_ibd, id_to_info, pw_ll)
            if var_ll > best_ll:
                best_var = var
                best_ll = var_ll
        
        # If no improvement, stop optimization
        if best_ll <= current_ll:
            break
            
        # Update current pedigree and likelihood
        current_ped = best_var
        current_ll = best_ll
    
    return current_ped, current_ll

This function implements an iterative optimization process that:

  1. Starts with an initial pedigree structure
  2. Generates variations by making small changes to the structure
  3. Evaluates each variation to find the one with the highest likelihood
  4. Updates the current structure if an improvement is found
  5. Repeats until no further improvement is possible or the maximum iterations are reached

The generation of variations is handled by generate_pedigree_variations, which creates alternative structures by:

  • Adding or removing ungenotyped individuals
  • Changing relationship types between individuals
  • Rearranging connections to form different family structures
  • Merging or splitting family units

This optimization approach allows Bonsai v3 to refine initial pedigree structures to better explain the genetic data, creating small structures that more accurately reflect the true relationships between individuals.

Core Component: Bonsai v3's ability to connect individuals into small pedigree structures forms the foundation of its pedigree reconstruction capabilities. By systematically identifying connection points, evaluating alternative structures, and optimizing small pedigrees, Bonsai can build biologically plausible family units that accurately explain the genetic relationships between individuals. These small structures serve as the building blocks for larger, more complex pedigree reconstructions.

Comparing Notebook and Production Code

The Lab13 notebook provides a simplified exploration of pedigree construction mechanisms, while the production implementation in Bonsai v3 includes additional sophistication:

The notebook provides an educational introduction to the key concepts, but the production implementation represents years of refinement to handle the complexities of real-world genetic data and family structures.

Interactive Lab Environment

Run the interactive Lab 13 notebook in Google Colab:

Google Colab Environment

Run the notebook in Google Colab for a powerful computing environment with access to Google's resources.

Data will be automatically downloaded from S3 when you run the notebook.

Note: You may need a Google account to save your work in Google Drive.

Open Lab 13 Notebook in Google Colab

Beyond the Code

As you explore the mechanisms for connecting individuals into small pedigree structures, consider these broader implications:

These considerations highlight how connecting individuals into pedigrees is not just a technical challenge but one with significant social, historical, and ethical dimensions that must be considered in applications of computational genetic genealogy.

This lab is part of the Bonsai v3 Deep Dive track:

Introduction

Lab 01

Architecture

Lab 02

IBD Formats

Lab 03

Statistics

Lab 04

Models

Lab 05

Relationships

Lab 06

PwLogLike

Lab 07

Age Modeling

Lab 08

Data Structures

Lab 09

Up-Node Dict

Lab 10

Connection Points

Lab 11

Relationship Assessment

Lab 12

Small Pedigrees

Lab 13