Lab 14: Optimizing Pedigrees | Computational Genetic Genealogy

Lab 14: Optimizing Small Pedigree Configurations

Core Component: This lab explores how Bonsai v3 optimizes small pedigree configurations to identify the most likely structure that explains observed genetic data. This optimization process is a critical step in pedigree reconstruction, as it determines which configurations best match biological reality among many possible alternatives.

The Challenge of Finding Optimal Configurations

The Optimization Problem

In genetic genealogy, determining the optimal pedigree configuration presents several challenges:

Combinatorial Explosion: The number of possible pedigree configurations grows exponentially with the number of individuals
Ambiguous Evidence: Different relationship types can produce similar genetic sharing patterns
Incomplete Data: Missing genotype data or limited IBD detection can create uncertainty
Multiple Plausible Solutions: Various pedigree structures may explain the observed data equally well

Bonsai v3 addresses these challenges through sophisticated optimization techniques implemented in its connections.py and pedigrees.py modules. The central optimization function is combine_up_dicts:

def combine_up_dicts(
    id_to_up_dct: dict[int, dict[int, dict[int, int]]],
    id_to_shared_ibd: dict[tuple[int, int], list[dict]],
    id_to_info: dict[int, dict],
    pw_ll: Any,
    max_up: int = 3,
    n_keep: int = 5,
    id_set_to_exclude: set[int] = None,
):
    """
    Combine all pedigrees in id_to_up_dct incrementally.
    
    Args:
        id_to_up_dct: Dictionary mapping IDs to their up-node dictionaries
        id_to_shared_ibd: Dictionary mapping pairs of IDs to their IBD segments
        id_to_info: Dictionary with demographic information for individuals
        pw_ll: PwLogLike instance for likelihood calculation
        max_up: Maximum number of generations to extend upward
        n_keep: Number of top pedigrees to keep at each step
        id_set_to_exclude: Set of IDs to exclude from combination
        
    Returns:
        List of optimized pedigrees with their likelihoods
    """
    # Identify individuals with IBD sharing
    id_set = get_ids_with_sharing(id_to_shared_ibd)
    
    # Filter out excluded IDs
    if id_set_to_exclude is not None:
        id_set = id_set - id_set_to_exclude
    
    # If no relevant IDs, return empty result
    if not id_set:
        return []
    
    # Get individual pedigrees to combine
    id_list = sorted(id_set)
    
    # Sort pairs by amount of IBD sharing (most to least)
    pair_list = sort_pairs_by_ibd(id_to_shared_ibd, id_list)
    
    # Initialize with the first individual's pedigree
    up_dct_list = [(id_to_up_dct[id_list[0]], 0.0)]
    
    # Incrementally build the pedigree by combining with additional individuals
    for i in range(1, len(id_list)):
        new_id = id_list[i]
        new_up_dct = id_to_up_dct[new_id]
        
        # Combine existing pedigrees with the new individual
        combined_list = []
        for up_dct, ll in up_dct_list:
            # Try different ways to combine pedigrees
            combinations = combine_pedigrees(
                up_dct1=up_dct,
                up_dct2=new_up_dct,
                id_to_shared_ibd=id_to_shared_ibd,
                id_to_info=id_to_info,
                pw_ll=pw_ll,
                max_up=max_up,
                keep_num=n_keep,
                return_many=True,
            )
            
            # Add all combinations to the list
            combined_list.extend(combinations)
        
        # If no valid combinations were found, continue
        if not combined_list:
            continue
            
        # Sort by likelihood and keep top n_keep
        combined_list.sort(key=lambda x: x[1], reverse=True)
        up_dct_list = combined_list[:n_keep]
    
    return up_dct_list

This function orchestrates the incremental optimization process, which follows these key steps:

Identify individuals with IBD sharing to prioritize important connections
Sort pairs by amount of IBD sharing, focusing on the strongest connections first
Build pedigrees incrementally, starting with one individual and adding others
Evaluate multiple ways to combine pedigrees at each step
Maintain a list of the most likely pedigree configurations
Return the optimized pedigrees with their calculated likelihoods

This incremental approach allows Bonsai v3 to handle the combinatorial complexity of pedigree optimization while focusing computational resources on the most promising configurations. It's a balance between exhaustive search and efficient pruning of the search space.

Evaluating Pedigree Configurations

At the heart of pedigree optimization is the ability to evaluate how well different configurations explain the observed genetic data. In Bonsai v3, this is implemented in the get_ped_like function:

def get_ped_like(
    up_dct: dict[int, dict[int, int]],
    id_to_shared_ibd: dict[tuple[int, int], list[dict]],
    id_to_info: dict[int, dict],
    pw_ll: Any,
):
    """
    Calculate the log-likelihood of a pedigree given IBD data.
    
    Args:
        up_dct: Up-node dictionary representing the pedigree
        id_to_shared_ibd: Dict mapping ID pairs to IBD segments
        id_to_info: Dict mapping IDs to demographic information
        pw_ll: PwLogLike instance for likelihood calculation
        
    Returns:
        log_like: Log-likelihood of the pedigree
    """
    log_like = 0.0
    
    # Get all genotyped individuals in the pedigree
    genotyped_ids = get_genotyped_ids(up_dct)
    
    # For each pair of genotyped individuals
    for i, id1 in enumerate(genotyped_ids):
        for id2 in genotyped_ids[i+1:]:
            # Skip if no IBD data for this pair
            pair = (min(id1, id2), max(id1, id2))
            if pair not in id_to_shared_ibd:
                continue
                
            # Get IBD data for this pair
            ibd_segs = id_to_shared_ibd[pair]
            
            # Get relationship from pedigree
            rel_tuple = get_simple_rel_tuple(up_dct, id1, id2)
            
            # Calculate likelihood of this relationship
            pair_ll = pw_ll.get_ibd_log_like(
                id1=id1,
                id2=id2,
                rel_tuple=rel_tuple,
                ibd_segs=ibd_segs,
            )
            
            # Add to total log-likelihood
            log_like += pair_ll
    
    # Add age-based likelihood component
    age_ll = get_age_log_like(up_dct, id_to_info)
    log_like += age_ll
    
    # Add structural likelihood component
    struct_ll = get_structural_log_like(up_dct)
    log_like += struct_ll
    
    return log_like

This function computes the total log-likelihood of a pedigree by combining several components:

Genetic Component: How well the pedigree explains observed IBD patterns between pairs of individuals
Age Component: How well the pedigree respects age constraints (e.g., parents older than children)
Structural Component: How biologically plausible the pedigree structure is

The genetic component is calculated using the PwLogLike class, which implements sophisticated statistical models for the expected patterns of IBD sharing between different types of relatives. These models account for:

Total Shared IBD: The total amount of DNA shared (in centimorgans)
Number of Segments: How many distinct IBD segments are shared
Segment Length Distribution: The pattern of segment lengths characteristic of different relationships
IBD2 Regions: Regions where both chromosomes are identical (important for distinguishing full siblings)

By combining these components, Bonsai v3 can evaluate pedigree configurations based on both genetic evidence and biological constraints, making it a powerful tool for pedigree optimization.

Generating Alternative Configurations

An essential part of pedigree optimization is generating alternative configurations to explore the space of possible pedigrees. Bonsai v3 handles this through the combine_pedigrees function:

def combine_pedigrees(
    up_dct1: dict[int, dict[int, int]],
    up_dct2: dict[int, dict[int, int]],
    id_to_shared_ibd: dict[tuple[int, int], list[dict]],
    id_to_info: dict[int, dict],
    pw_ll: Any,
    max_up: int = 3,
    keep_num: int = 3,
    return_many: bool = False,
):
    """
    Combine two pedigrees into one, using IBD sharing to guide the connection.
    
    Args:
        up_dct1, up_dct2: The pedigrees to combine
        id_to_shared_ibd: Dict mapping ID pairs to IBD segments
        id_to_info: Dict mapping IDs to biographical information
        pw_ll: PwLogLike instance for likelihood calculation
        max_up: Maximum number of generations to extend upward
        keep_num: Number of top combinations to keep
        return_many: Whether to return multiple possible combinations
        
    Returns:
        Combined pedigree or list of top combinations with likelihoods
    """
    # Find pairs of individuals that connect the pedigrees
    con_pairs = find_connecting_pairs(
        up_dct1=up_dct1,
        up_dct2=up_dct2,
        id_to_shared_ibd=id_to_shared_ibd,
    )
    
    if not con_pairs:
        return None if not return_many else []
    
    # Get connection points in each pedigree
    con_pts1 = get_possible_connection_point_set(up_dct1)
    con_pts2 = get_possible_connection_point_set(up_dct2)
    
    # Identify likely connection points based on IBD sharing
    likely_pts1 = get_likely_con_pt_set(up_dct1, id_to_shared_ibd, con_pts1)
    likely_pts2 = get_likely_con_pt_set(up_dct2, id_to_shared_ibd, con_pts2)
    
    # Generate and evaluate different ways to connect the pedigrees
    all_combinations = []
    for (id1, id2) in con_pairs:
        for cp1 in likely_pts1:
            if cp1[0] != id1:
                continue
                
            for cp2 in likely_pts2:
                if cp2[0] != id2:
                    continue
                    
                # Try different relationship configurations
                for up in range(max_up + 1):
                    for down in range(max_up + 1):
                        if up + down > max_up:
                            continue
                            
                        # Try both one and two common ancestors
                        for num_ancs in [1, 2]:
                            # Connect the pedigrees with this configuration
                            combinations = connect_pedigrees_through_points(
                                id1=cp1[0], 
                                id2=cp2[0],
                                pid1=cp1[1], 
                                pid2=cp2[1],
                                up_dct1=up_dct1, 
                                up_dct2=up_dct2,
                                deg1=up, 
                                deg2=down,
                                num_ancs=num_ancs,
                            )
                            
                            # Evaluate each combination
                            for comb in combinations:
                                ll = get_ped_like(
                                    up_dct=comb,
                                    id_to_shared_ibd=id_to_shared_ibd,
                                    id_to_info=id_to_info,
                                    pw_ll=pw_ll,
                                )
                                
                                all_combinations.append((comb, ll))
    
    # Sort by likelihood and keep top combinations
    all_combinations.sort(key=lambda x: x[1], reverse=True)
    top_combinations = all_combinations[:keep_num]
    
    # Return results based on return_many parameter
    if return_many:
        return top_combinations
    else:
        return top_combinations[0][0] if top_combinations else None

This function systematically generates alternative pedigree configurations by:

Identifying pairs of individuals that connect the pedigrees based on IBD sharing
Finding all possible connection points in each pedigree
Restricting to likely connection points based on IBD sharing patterns
Generating different relationship configurations by varying the number of generations up and down
Trying different numbers of common ancestors (1 or 2) for the connection
Evaluating each configuration based on its likelihood
Returning either the best configuration or a ranked list of top configurations

This approach allows Bonsai v3 to explore a wide range of possible pedigree structures while focusing computational resources on the most promising configurations. It systematically varies key parameters that define relationship types:

Number of generations up (up): How many generations to extend upward from the first individual
Number of generations down (down): How many generations to extend downward from the second individual
Number of common ancestors (num_ancs): Whether the connection involves one or two common ancestors

By varying these parameters, Bonsai v3 can generate configurations for all biologically possible relationship types, from parent-child to distant cousins.

Optimizing With Multiple Constraints

Integrating Genetic and Non-Genetic Evidence

Bonsai v3 optimizes pedigree configurations by integrating multiple types of evidence, including both genetic and non-genetic information. This is implemented through a composite likelihood approach:

def evaluate_up_dict_with_likelihood_components(
    up_dct: dict[int, dict[int, int]],
    id_to_shared_ibd: dict[tuple[int, int], list[dict]],
    id_to_info: dict[int, dict],
    pw_ll: Any,
    return_components: bool = False,
):
    """
    Evaluate a pedigree with separate likelihood components.
    
    Args:
        up_dct: Up-node dictionary representing the pedigree
        id_to_shared_ibd: Dict mapping ID pairs to IBD segments
        id_to_info: Dict mapping IDs to demographic information
        pw_ll: PwLogLike instance for likelihood calculation
        return_components: Whether to return individual components
        
    Returns:
        Total log-likelihood or tuple of (total, components)
    """
    # Calculate genetic likelihood component
    genetic_ll = 0.0
    for pair, ibd_segs in id_to_shared_ibd.items():
        id1, id2 = pair
        
        # Skip if either individual is not in pedigree
        if id1 not in up_dct or id2 not in up_dct:
            continue
            
        # Get relationship from pedigree
        rel_tuple = get_simple_rel_tuple(up_dct, id1, id2)
        
        # Calculate likelihood for this pair
        pair_ll = pw_ll.get_ibd_log_like(
            id1=id1,
            id2=id2,
            rel_tuple=rel_tuple,
            ibd_segs=ibd_segs,
        )
        
        genetic_ll += pair_ll
    
    # Calculate age likelihood component
    age_ll = 0.0
    if id_to_info:
        age_ll = get_age_log_like(up_dct, id_to_info)
    
    # Calculate structural likelihood component
    struct_ll = get_structural_log_like(up_dct)
    
    # Calculate sex constraint likelihood component
    sex_ll = 0.0
    if id_to_info:
        sex_ll = get_sex_constraint_log_like(up_dct, id_to_info)
    
    # Combine components with appropriate weights
    total_ll = (0.7 * genetic_ll + 
                0.15 * age_ll + 
                0.1 * struct_ll + 
                0.05 * sex_ll)
    
    if return_components:
        components = {
            'genetic': genetic_ll,
            'age': age_ll,
            'structural': struct_ll,
            'sex': sex_ll
        }
        return total_ll, components
    else:
        return total_ll

This function calculates a weighted sum of multiple likelihood components:

Genetic Likelihood (70%): How well the pedigree explains observed IBD patterns
Age Likelihood (15%): How well the pedigree respects age constraints
Structural Likelihood (10%): How biologically plausible the structure is
Sex Constraint Likelihood (5%): How well the pedigree respects sex constraints

By weighting these components, Bonsai v3 can prioritize genetic evidence while still ensuring that the resulting pedigrees respect biological constraints. This is crucial for resolving ambiguities in cases where multiple configurations might explain the genetic data equally well.

The age-based component is particularly important for distinguishing relationships that have similar IBD patterns but different generational structures, such as half-siblings vs. grandparent-grandchild. This is implemented in the get_age_log_like function:

def get_age_log_like(
    up_dct: dict[int, dict[int, int]],
    id_to_info: dict[int, dict],
):
    """
    Calculate log-likelihood of a pedigree based on age constraints.
    
    Args:
        up_dct: Up-node dictionary representing the pedigree
        id_to_info: Dict mapping IDs to demographic information
        
    Returns:
        log_like: Log-likelihood based on age constraints
    """
    log_like = 0.0
    
    # For each parent-child relationship
    for child, parents in up_dct.items():
        if child not in id_to_info:
            continue
            
        child_age = id_to_info[child].get('age')
        if child_age is None:
            continue
            
        for parent in parents:
            if parent not in id_to_info:
                continue
                
            parent_age = id_to_info[parent].get('age')
            if parent_age is None:
                continue
                
            # Calculate age difference
            age_diff = parent_age - child_age
            
            # Parent should be older than child by at least MIN_PARENT_AGE
            MIN_PARENT_AGE = 15
            
            if age_diff < MIN_PARENT_AGE:
                # Penalty for implausible age difference
                log_like -= 10.0 * (MIN_PARENT_AGE - age_diff)
            elif age_diff > 100:
                # Penalty for extremely large age differences
                log_like -= 0.1 * (age_diff - 100)
    
    return log_like

This function penalizes biologically implausible age relationships, such as parents who are younger than their children or age differences that are too small or too large. By incorporating these constraints, Bonsai v3 can optimize pedigree configurations that respect both genetic evidence and biological reality.

Resolving Ambiguous Cases

In many cases, multiple pedigree configurations might explain the observed genetic data equally well. Bonsai v3 handles these ambiguous cases by maintaining multiple hypotheses and resolving them using additional evidence when available. This is implemented in the resolve_ambiguity function:

def resolve_ambiguity(
    candidates: list[tuple[dict[int, dict[int, int]], float]],
    id_to_shared_ibd: dict[tuple[int, int], list[dict]],
    id_to_info: dict[int, dict],
    pw_ll: Any,
    min_like_diff: float = 3.0,
):
    """
    Resolve ambiguity between multiple candidate pedigrees.
    
    Args:
        candidates: List of (pedigree, log_likelihood) candidates
        id_to_shared_ibd: Dict mapping ID pairs to IBD segments
        id_to_info: Dict mapping IDs to demographic information
        pw_ll: PwLogLike instance for likelihood calculation
        min_like_diff: Minimum log-likelihood difference to consider decisive
        
    Returns:
        best_ped: The best pedigree after ambiguity resolution
        confidence: Confidence score (0-1) in the result
        alternatives: List of plausible alternative pedigrees
    """
    # Sort candidates by likelihood
    candidates.sort(key=lambda x: x[1], reverse=True)
    
    # If only one candidate or large likelihood difference, unambiguous
    if len(candidates) == 1 or candidates[0][1] - candidates[1][1] >= min_like_diff:
        return candidates[0][0], 1.0, []
    
    # Calculate comprehensive likelihood with all components
    scored_candidates = []
    for ped, _ in candidates:
        total_ll, components = evaluate_up_dict_with_likelihood_components(
            up_dct=ped,
            id_to_shared_ibd=id_to_shared_ibd,
            id_to_info=id_to_info,
            pw_ll=pw_ll,
            return_components=True,
        )
        scored_candidates.append((ped, total_ll, components))
    
    # Sort candidates by comprehensive likelihood
    scored_candidates.sort(key=lambda x: x[1], reverse=True)
    
    # Get best candidate and alternatives
    best_ped, best_ll, _ = scored_candidates[0]
    
    # Identify plausible alternatives
    alternatives = []
    for ped, ll, _ in scored_candidates[1:]:
        if best_ll - ll < min_like_diff:
            alternatives.append(ped)
    
    # Calculate confidence based on likelihood difference
    if alternatives:
        second_best_ll = scored_candidates[1][1]
        confidence = 1.0 - math.exp(-(best_ll - second_best_ll))
    else:
        confidence = 1.0
    
    return best_ped, confidence, alternatives

This function resolves ambiguity by:

Evaluating candidates using a comprehensive likelihood that includes all evidence
Identifying the best candidate based on total likelihood
Finding plausible alternatives whose likelihood is close to the best
Calculating a confidence score based on the likelihood difference
Returning the best pedigree, confidence score, and plausible alternatives

This approach allows Bonsai v3 to make a principled choice while still acknowledging uncertainty when multiple configurations are almost equally likely. The confidence score provides a quantitative measure of how certain the system is about the chosen configuration, which is crucial for transparent and reliable pedigree reconstruction.

In cases where ambiguity cannot be resolved with available evidence, Bonsai v3 can maintain multiple hypotheses for future resolution when more data becomes available. This is particularly important for small pedigree structures, where ambiguity is common due to limited genetic data and the similarity of IBD patterns between different relationship types.

Systematic Search of Configuration Space

To find the optimal pedigree configuration, Bonsai v3 implements a systematic search of the configuration space. The search process is guided by likelihood scores and focuses computational resources on the most promising regions of the search space. This is implemented through the optimize_pedigree function:

def optimize_pedigree(
    up_dct: dict[int, dict[int, int]],
    id_to_shared_ibd: dict[tuple[int, int], list[dict]],
    id_to_info: dict[int, dict],
    pw_ll: Any,
    max_iterations: int = 10,
    max_alternatives: int = 5,
):
    """
    Optimize a pedigree through iterative improvements.
    
    Args:
        up_dct: Initial up-node dictionary pedigree
        id_to_shared_ibd: Dict mapping ID pairs to IBD segments
        id_to_info: Dict mapping IDs to demographic information
        pw_ll: PwLogLike instance for likelihood calculation
        max_iterations: Maximum number of optimization iterations
        max_alternatives: Maximum number of alternatives to consider
        
    Returns:
        optimized pedigree: The optimized pedigree structure
        log_likelihood: Log-likelihood of the optimized pedigree
    """
    current_ped = copy.deepcopy(up_dct)
    current_ll = get_ped_like(current_ped, id_to_shared_ibd, id_to_info, pw_ll)
    
    for iteration in range(max_iterations):
        # Generate alternative configurations
        alternatives = generate_alternatives(
            up_dct=current_ped,
            id_to_shared_ibd=id_to_shared_ibd,
            max_alternatives=max_alternatives,
        )
        
        # Evaluate alternatives
        best_alt = None
        best_alt_ll = current_ll
        
        for alt in alternatives:
            alt_ll = get_ped_like(alt, id_to_shared_ibd, id_to_info, pw_ll)
            
            if alt_ll > best_alt_ll:
                best_alt = alt
                best_alt_ll = alt_ll
        
        # If no improvement, stop optimization
        if best_alt is None or best_alt_ll <= current_ll:
            break
            
        # Update current pedigree
        current_ped = best_alt
        current_ll = best_alt_ll
    
    return current_ped, current_ll

This function implements an iterative optimization process:

Start with an initial pedigree configuration
Generate alternative configurations through systematic variations
Evaluate each alternative using the likelihood function
Select the best alternative if it improves the likelihood
Repeat until no further improvement or maximum iterations reached

The alternative configurations are generated by generate_alternatives, which systematically varies the pedigree structure:

def generate_alternatives(
    up_dct: dict[int, dict[int, int]],
    id_to_shared_ibd: dict[tuple[int, int], list[dict]],
    max_alternatives: int = 5,
):
    """
    Generate alternative pedigree configurations.
    
    Args:
        up_dct: Current pedigree configuration
        id_to_shared_ibd: Dict mapping ID pairs to IBD segments
        max_alternatives: Maximum number of alternatives to generate
        
    Returns:
        alternatives: List of alternative pedigree configurations
    """
    alternatives = []
    
    # 1. Try adding new relationships between individuals with IBD
    for (id1, id2), ibd_segs in id_to_shared_ibd.items():
        # Skip if no significant IBD sharing
        if sum(seg['length_cm'] for seg in ibd_segs) < 20:
            continue
            
        # Skip if both individuals not in pedigree
        if id1 not in up_dct or id2 not in up_dct:
            continue
            
        # Get current relationship
        rel_tuple = get_simple_rel_tuple(up_dct, id1, id2)
        
        # If no relationship exists, try parent-child
        if rel_tuple is None:
            # Try making id1 parent of id2
            new_ped = copy.deepcopy(up_dct)
            if id2 not in new_ped:
                new_ped[id2] = {}
            new_ped[id2][id1] = 1
            alternatives.append(new_ped)
            
            if len(alternatives) >= max_alternatives:
                break
                
            # Try making id2 parent of id1
            new_ped = copy.deepcopy(up_dct)
            if id1 not in new_ped:
                new_ped[id1] = {}
            new_ped[id1][id2] = 1
            alternatives.append(new_ped)
            
            if len(alternatives) >= max_alternatives:
                break
    
    # 2. Try adding new ancestors
    for node in up_dct:
        if len(up_dct[node]) < 2:
            # Add a parent
            new_ped, _ = add_parent(node, up_dct)
            alternatives.append(new_ped)
            
            if len(alternatives) >= max_alternatives:
                break
    
    # 3. Try removing relationships
    for child, parents in up_dct.items():
        for parent in list(parents.keys()):
            # Create a new configuration with this relationship removed
            new_ped = copy.deepcopy(up_dct)
            new_ped[child].pop(parent, None)
            
            alternatives.append(new_ped)
            if len(alternatives) >= max_alternatives:
                break
    
    return alternatives

This function generates alternatives through various operations:

Adding Relationships: Adding new parent-child relationships between individuals with IBD sharing
Adding Ancestors: Adding new ungenotyped parents to individuals with fewer than two parents
Removing Relationships: Removing existing parent-child relationships to explore simpler structures

By systematically exploring the configuration space through these operations, Bonsai v3 can find the optimal small pedigree structure that best explains the observed genetic data while respecting biological constraints.

Core Component: Bonsai v3's ability to optimize small pedigree configurations is essential for accurate pedigree reconstruction. By systematically exploring the space of possible configurations, evaluating them based on multiple types of evidence, and resolving ambiguities with additional constraints, Bonsai can identify the most likely pedigree structure that explains observed genetic data, even in the presence of noise and uncertainty.

Comparing Notebook and Production Code

The Lab14 notebook provides a simplified exploration of pedigree optimization, while the production implementation in Bonsai v3 includes additional sophistication:

Advanced Search Strategies: The production code uses more sophisticated search strategies that efficiently prune the vast configuration space
Comprehensive Likelihood Models: The real implementation includes more detailed likelihood models calibrated on large reference datasets
Optimization Heuristics: Production code includes various heuristics to focus computational resources on promising configurations
Uncertainty Quantification: More rigorous statistical methods for quantifying uncertainty and maintaining multiple hypotheses
Performance Optimization: Advanced techniques for caching, parallelization, and memory management to handle large-scale pedigrees
Sophisticated Constraints: More comprehensive biological constraints beyond just age and sex relationships

The notebook provides an educational introduction to the key concepts, but the production implementation represents years of refinement to handle the complexities of real-world pedigree optimization.

Interactive Lab Environment

Run the interactive Lab 14 notebook in Google Colab:

Google Colab Environment

Run the notebook in Google Colab for a powerful computing environment with access to Google's resources.

Data will be automatically downloaded from S3 when you run the notebook.

Note: You may need a Google account to save your work in Google Drive.

Open Lab 14 Notebook in Google Colab

Beyond the Code

As you explore pedigree optimization techniques, consider these broader implications:

Statistical Interpretation: The likelihood framework allows for principled statistical interpretation of pedigree reconstruction confidence
Ethical Considerations: Handling ambiguity responsibly is crucial, especially when reconstruction might reveal unexpected family relationships
Historical Applications: How optimization techniques can be applied to reconstruct historical pedigrees from limited genetic data
Computational Challenges: The balance between exhaustive search and efficient algorithms when scaling to larger pedigrees
Inter-Disciplinary Connections: How techniques from operations research, combinatorial optimization, and Bayesian statistics contribute to pedigree optimization

These considerations highlight how pedigree optimization is not just a technical challenge but one with significant methodological, ethical, and interdisciplinary dimensions that must be considered in applications of computational genetic genealogy.

This lab is part of the Bonsai v3 Deep Dive track:

Introduction

Lab 01

Architecture

Lab 02

IBD Formats

Lab 03

Statistics

Lab 04

Models

Lab 05

Relationships

Lab 06

PwLogLike

Lab 07

Age Modeling

Lab 08

Data Structures

Lab 09

Up-Node Dict

Lab 10

Connection Points

Lab 11

Relationship Assessment

Lab 12

Small Pedigrees

Lab 13

Optimizing Pedigrees

Lab 14