Lab 07: PwLogLike Class | Computational Genetic Genealogy

Lab 07: The PwLogLike Class and Likelihood Computation

Core Component: This lab examines the PwLogLike class, the computational engine at the heart of Bonsai v3's relationship inference capabilities. Understanding this class is essential for grasping how Bonsai quantifies the probability of different relationships from genetic and demographic evidence.

Class Architecture and Design

Design Philosophy

The PwLogLike class embodies a key principle in Bonsai v3's architecture: encapsulation of statistical models into cohesive objects that handle both data and computation. This class is responsible for transforming raw IBD data and demographic information into quantitative assessments of relationship likelihoods.

Key design principles include:

Encapsulation: The class contains both the data and the methods needed for likelihood computation, promoting modular design
Caching: Extensive use of internal caching to avoid redundant computation of expensive likelihood calculations
Comprehensive Evidence Integration: Combining multiple sources of evidence (genetic and demographic) with appropriate weighting
Bayesian Framework: Implementation of a principled Bayesian approach to relationship inference
Performance Optimization: Careful attention to computational efficiency for handling large datasets

The class is defined in likelihoods.py and serves as the computational foundation for all relationship inference in Bonsai v3, from pairwise assessments to complete pedigree reconstruction.

Class Initialization and Input Data

The PwLogLike class is initialized with several key data sources:

class PwLogLike:
    def __init__(self, bio_info=None, unphased_ibd_seg_list=None, phased_ibd_seg_list=None,
                 condition_pair_set=None, mean_bgd_num=None, mean_bgd_len=None,
                 min_seg_len=7.0):
        """Initialize with IBD data and demographic information.
        
        Args:
            bio_info: List of dictionaries with biographical information
                Each dictionary must have 'genotype_id' and can have 'age', 'sex', 'coverage'
            unphased_ibd_seg_list: List of unphased IBD segments
                [id1, id2, chromosome, start_pos, end_pos, is_full_ibd, length_cm]
            phased_ibd_seg_list: Optional list of phased IBD segments
                [id1, id2, hap1, hap2, chromosome, start_cm, end_cm, length_cm]
            condition_pair_set: Optional set of pairs to condition on
            mean_bgd_num: Mean number of background IBD segments
            mean_bgd_len: Mean length of background IBD segments
            min_seg_len: Minimum segment length threshold
        """

The bio_info parameter contains demographic information:

genotype_id: Unique identifier for an individual
age: Individual's age, used for age-based likelihood calculation
sex: Individual's biological sex, used for relationship validation
coverage: Proportion of the genome that was successfully genotyped

The IBD segment lists come in two formats:

Unphased format: [id1, id2, chromosome, start_pos, end_pos, is_full_ibd, length_cm]
Phased format: [id1, id2, hap1, hap2, chromosome, start_cm, end_cm, length_cm]

During initialization, PwLogLike processes these inputs to create several internal data structures:

ibd_stat_dict: Dictionary mapping individual pairs to IBD statistics
id_to_info: Dictionary mapping individual IDs to biographical information
age_diff_dict: Dictionary mapping individual pairs to age differences
coverage_dict: Dictionary mapping individual pairs to coverage information
_ll_cache: Cache for computed log-likelihoods to avoid redundant computation

The processing of raw IBD segments into statistics involves sophisticated algorithms that handle edge cases like overlapping segments and chromosome boundaries. The _compute_ibd_stats() method implements these algorithms, resulting in a comprehensive set of IBD statistics for each pair of individuals.

Internal Data Structures

The PwLogLike class maintains several sophisticated internal data structures to optimize performance and support its inference capabilities:

The ibd_stat_dict maps individual pairs to dictionaries containing key statistics:

ibd_stat_dict = {
    frozenset([id1, id2]): {
        'total_half': 1250.0,   # Total cM of IBD1 (half-identical) segments
        'total_full': 350.0,    # Total cM of IBD2 (fully identical) segments
        'num_half': 25,         # Number of IBD1 segments
        'num_full': 5,          # Number of IBD2 segments
        'max_seg_cm': 120.0,    # Length of largest segment
        'half_seg_lens': [50.0, 45.0, ...],  # List of IBD1 segment lengths
        'full_seg_lens': [80.0, 75.0, ...]   # List of IBD2 segment lengths
    },
    ...
}

These statistics are extracted using a careful process that ensures accurate representation of IBD sharing. The extraction involves several steps:

Sorting segments by chromosome and position
Handling overlapping segments to avoid double-counting
Classifying segments as IBD1 or IBD2 based on the is_full_ibd flag
Computing aggregate statistics while accounting for coverage differences

The id_to_info dictionary provides efficient access to demographic information:

id_to_info = {
    1001: {'age': 70, 'sex': 'M', 'coverage': 0.95},
    1002: {'age': 40, 'sex': 'F', 'coverage': 0.90},
    ...
}

This structure enables rapid lookups of individual attributes without repeatedly parsing the original bio_info list. The implementation includes validation to ensure consistent data and handling of missing values.

The likelihood cache is particularly important for performance optimization:

_ll_cache = {
    (id1, id2, (up, down, num_ancs)): -12.34,  # Cached log-likelihood
    ...
}

This caching mechanism dramatically improves performance when evaluating multiple relationship hypotheses for the same pair of individuals, a common operation during pedigree reconstruction.

Genetic Likelihood Computation

The Core Likelihood Method

The core method for likelihood computation in PwLogLike is get_log_like(), which orchestrates the overall likelihood calculation:

def get_log_like(self, id1, id2, relationship_tuple, condition=True):
    """Calculate the log-likelihood of a relationship between two individuals.
    
    Args:
        id1, id2: IDs of the individuals
        relationship_tuple: (up, down, num_ancs) tuple
        condition: Whether to condition on observing IBD
        
    Returns:
        Log-likelihood of the relationship
    """
    # Check cache
    cache_key = (id1, id2, relationship_tuple)
    if cache_key in self._ll_cache:
        return self._ll_cache[cache_key]
    
    # Get genetic likelihood
    gen_ll = self.get_pw_gen_ll(id1, id2, relationship_tuple, condition)
    
    # Get age-based likelihood if available
    if id1 in self.id_to_info and id2 in self.id_to_info:
        age1 = self.id_to_info[id1].get('age')
        age2 = self.id_to_info[id2].get('age')
        
        if age1 is not None and age2 is not None:
            # Calculate age weight (depends on quality of age information)
            age_weight = self._get_age_weight(id1, id2)
            
            # Get age-based likelihood
            age_ll = self.get_pw_age_ll(id1, id2, relationship_tuple)
            
            # Combine likelihoods
            ll = gen_ll + age_weight * age_ll
        else:
            ll = gen_ll
    else:
        ll = gen_ll
    
    # Cache and return
    self._ll_cache[cache_key] = ll
    return ll

This method demonstrates the two-step process at the heart of Bonsai's relationship inference:

Calculate the genetic likelihood based on IBD statistics
Incorporate age-based likelihood when demographic information is available

The method includes sophisticated caching to avoid redundant computation, dramatically improving performance when evaluating multiple relationship hypotheses for the same pair of individuals.

Genetic Likelihood Components

The genetic likelihood calculation is implemented in the get_pw_gen_ll() method, which combines multiple sources of genetic evidence:

def get_pw_gen_ll(self, id1, id2, relationship_tuple, condition=True):
    """Calculate genetic log-likelihood of a relationship.
    
    Args:
        id1, id2: IDs of the individuals
        relationship_tuple: (up, down, num_ancs) tuple
        condition: Whether to condition on observing IBD
        
    Returns:
        Genetic log-likelihood
    """
    # Get IBD statistics
    pair_key = frozenset([id1, id2])
    if pair_key not in self.ibd_stat_dict:
        # No IBD observed between these individuals
        if condition:
            return -20.0  # Very low but not impossible
        else:
            # Calculate probability of no IBD given relationship
            # [implementation details...]
    
    stats = self.ibd_stat_dict[pair_key]
    
    # Get relationship parameters
    up, down, num_ancs = relationship_tuple
    meiotic_distance = up + down
    
    # Calculate likelihood components
    
    # 1. Segment count likelihood
    count_ll = self._get_count_likelihood(stats, relationship_tuple)
    
    # 2. Segment length likelihood
    length_ll = self._get_length_likelihood(stats, relationship_tuple)
    
    # 3. IBD2 proportion likelihood
    ibd2_ll = self._get_ibd2_likelihood(stats, relationship_tuple)
    
    # 4. Total IBD likelihood
    total_ll = self._get_total_ibd_likelihood(stats, relationship_tuple)
    
    # Combine components with appropriate weights
    # [weighting logic...]
    
    return combined_ll

The genetic likelihood combines four main components:

Segment Count Likelihood: How well the observed number of segments matches expectations
Segment Length Likelihood: How well the distribution of segment lengths matches expectations
IBD2 Proportion Likelihood: How well the proportion of IBD2 segments matches expectations
Total IBD Likelihood: How well the total amount of IBD sharing matches expectations

Each component is calculated using statistical models from the moments module, which we explored in earlier labs. For example, the segment count likelihood uses a Poisson distribution with parameters derived from relationship properties:

def _get_count_likelihood(self, stats, relationship_tuple):
    """Calculate likelihood of observed segment count.
    
    Args:
        stats: IBD statistics dictionary
        relationship_tuple: (up, down, num_ancs) tuple
        
    Returns:
        Log-likelihood of observed segment count
    """
    # Extract observed counts
    num_half = stats['num_half']
    num_full = stats['num_full']
    
    # Get expected counts from moments module
    up, down, num_ancs = relationship_tuple
    meiotic_distance = up + down
    
    eta_half = moments.get_eta_half(meiotic_distance, num_ancs, self.min_seg_len)
    eta_full = moments.get_eta_full(meiotic_distance, num_ancs, self.min_seg_len)
    
    # Calculate likelihoods using Poisson distribution
    half_ll = stats.poisson.logpmf(num_half, eta_half)
    full_ll = stats.poisson.logpmf(num_full, eta_full)
    
    # Combine likelihoods
    return half_ll + full_ll

These sophisticated likelihood computations allow Bonsai to accurately assess the probability of different relationships even with noisy and incomplete genetic data. The implementation includes numerous refinements for handling edge cases, background IBD, and detector-specific characteristics.

Handling Background IBD

A critical aspect of PwLogLike's genetic likelihood calculation is its handling of background IBD—genetic sharing that occurs by chance between supposedly unrelated individuals due to distant common ancestry or technical artifacts.

The implementation includes background IBD modeling through parameters mean_bgd_num and mean_bgd_len, which characterize the expected background IBD for a given population:

def _get_background_likelihood(self, stats):
    """Calculate likelihood of observed IBD as background sharing.
    
    Args:
        stats: IBD statistics dictionary
        
    Returns:
        Log-likelihood under background model
    """
    # Extract observed statistics
    num_segs = stats['num_half'] + stats['num_full']
    total_ibd = stats['total_half'] + stats['total_full']
    
    # Calculate likelihood using background model
    # Segment count follows Poisson distribution
    count_ll = stats.poisson.logpmf(num_segs, self.mean_bgd_num)
    
    # Segment lengths follow exponential distribution
    if num_segs > 0:
        mean_len = total_ibd / num_segs
        length_ll = stats.expon.logpdf(mean_len, scale=self.mean_bgd_len)
    else:
        length_ll = 0.0
    
    return count_ll + length_ll

This background model serves two critical functions:

It provides a baseline for comparison when evaluating relationships
It helps distinguish true IBD from chance sharing or technical artifacts

The background parameters can be calibrated for different populations, allowing Bonsai to account for population-specific patterns of distant relatedness. This is particularly important for endogamous populations, where background IBD levels may be elevated.

Age-Based Likelihood Computation

Age Difference Models

One of the most powerful features of the PwLogLike class is its ability to incorporate age information into relationship inference. This is implemented in the get_pw_age_ll() method:

def get_pw_age_ll(self, id1, id2, relationship_tuple):
    """Calculate age-based log-likelihood of a relationship.
    
    Args:
        id1, id2: IDs of the individuals
        relationship_tuple: (up, down, num_ancs) tuple
        
    Returns:
        Age-based log-likelihood
    """
    # Get age information
    age1 = self.id_to_info[id1].get('age')
    age2 = self.id_to_info[id2].get('age')
    
    if age1 is None or age2 is None:
        return 0.0  # No age information available
    
    # Calculate age difference
    age_diff = age1 - age2
    
    # Get relationship-specific age parameters
    up, down, num_ancs = relationship_tuple
    
    # Parent-child relationships
    if up == 0 and down == 1:  # id1 is parent of id2
        mean_diff = 30.0
        std_dev = 8.0
        # Biological constraint: parent must be older than child
        if age_diff < 16:  # Minimum age for reproduction
            return float('-inf')  # Biologically impossible
    elif up == 1 and down == 0:  # id1 is child of id2
        mean_diff = -30.0
        std_dev = 8.0
        # Biological constraint: child must be younger than parent
        if age_diff > -16:  # Minimum age for reproduction
            return float('-inf')  # Biologically impossible
    
    # Sibling relationships
    elif up == 1 and down == 1:
        mean_diff = 0.0
        std_dev = 10.0  # Siblings can vary more in age
    
    # [additional relationship types...]
    
    # Calculate log-likelihood using normal distribution
    return stats.norm.logpdf(age_diff, mean_diff, std_dev)

The method implements a sophisticated statistical model of age differences for different relationship types. Key aspects include:

Relationship-Specific Distributions: Different relationships have different expected age differences
Biological Constraints: Hard constraints for biologically impossible age differences
Variance Models: Different standard deviations for different relationship types
Directionality: Accounting for the direction of the age difference (who is older)

These age models have been calibrated using demographic data from real families, ensuring that they accurately reflect typical age patterns in human populations.

Integrating Age with Genetic Evidence

The get_log_like() method integrates age-based and genetic likelihoods using a weighted combination:

combined_ll = gen_ll + age_weight * age_ll

The age_weight parameter controls the influence of age information and is dynamically determined based on several factors:

Age Reliability: How reliable the age information is estimated to be
Relationship Type: Some relationships are more strongly constrained by age than others
Age Difference Magnitude: Extreme age differences may be given more weight

The _get_age_weight() method implements this dynamic weighting:

def _get_age_weight(self, id1, id2):
    """Calculate weight for age-based likelihood.
    
    Args:
        id1, id2: IDs of the individuals
        
    Returns:
        Weight for age-based likelihood
    """
    # Base weight
    base_weight = 0.25
    
    # Adjust based on age information quality
    info1 = self.id_to_info[id1]
    info2 = self.id_to_info[id2]
    
    age1_quality = info1.get('age_quality', 1.0)
    age2_quality = info2.get('age_quality', 1.0)
    
    # Reduce weight for less reliable age information
    quality_factor = min(age1_quality, age2_quality)
    
    return base_weight * quality_factor

This integration of multiple evidence sources is a key strength of Bonsai v3, allowing it to leverage both genetic and demographic information for more accurate relationship inference.

Biological Constraint Validation

Beyond age-based likelihood calculation, PwLogLike also implements biological constraint validation using sex information. This is handled by the is_valid_relationship() method:

def is_valid_relationship(self, id1, id2, relationship_tuple):
    """Check if a relationship is biologically valid.
    
    Args:
        id1, id2: IDs of the individuals
        relationship_tuple: (up, down, num_ancs) tuple
        
    Returns:
        True if the relationship is biologically valid
    """
    # Get sex information
    sex1 = self.id_to_info.get(id1, {}).get('sex')
    sex2 = self.id_to_info.get(id2, {}).get('sex')
    
    # If sex information is missing, assume relationship is valid
    if sex1 is None or sex2 is None:
        return True
    
    up, down, num_ancs = relationship_tuple
    
    # Check parent-child constraints
    if up == 0 and down == 1:  # id1 is parent of id2
        # For biological parenthood with two parents, need one male and one female
        if num_ancs == 2:
            return False  # Can't be both biological parents
    elif up == 1 and down == 0:  # id1 is child of id2
        # Similar constraint
        if num_ancs == 2:
            return False
    
    # [additional constraints for other relationship types...]
    
    return True

This validation ensures that inferred relationships respect biological constraints, such as:

A single individual cannot be both biological parents of a child
Two males or two females cannot be the biological parents of a child
Certain relationship combinations are biologically impossible

By combining these biological constraints with the statistical likelihood models, PwLogLike ensures that inferred relationships are both statistically likely and biologically plausible.

Relationship Inference Applications

The Most Likely Relationship

The culmination of PwLogLike's capabilities is the get_most_likely_rel() method, which infers the most likely relationship between two individuals:

def get_most_likely_rel(self, id1, id2, max_degree=4):
    """Find the most likely relationship between two individuals.
    
    Args:
        id1, id2: IDs of the individuals
        max_degree: Maximum relationship degree to consider
        
    Returns:
        Tuple of (relationship_tuple, log_likelihood)
    """
    # Generate all possible relationship tuples up to max_degree
    relationship_tuples = []
    
    # Add self relationship
    if id1 == id2:
        relationship_tuples.append((0, 0, 2))
    
    # Add direct lineage relationships
    for deg in range(1, max_degree + 1):
        relationship_tuples.append((0, deg, 1))  # id1 is ancestor of id2
        relationship_tuples.append((deg, 0, 1))  # id1 is descendant of id2
    
    # Add collateral relationships
    for up in range(1, max_degree + 1):
        for down in range(1, max_degree + 1):
            if up + down <= max_degree * 2:
                relationship_tuples.append((up, down, 2))  # Full relationship
                relationship_tuples.append((up, down, 1))  # Half relationship
    
    # Calculate likelihood for each relationship
    likelihoods = []
    for rel_tuple in relationship_tuples:
        # Check if relationship is valid
        if not self.is_valid_relationship(id1, id2, rel_tuple):
            continue
            
        # Calculate log-likelihood
        log_ll = self.get_log_like(id1, id2, rel_tuple)
        likelihoods.append((rel_tuple, log_ll))
    
    # If no valid relationships, return None
    if not likelihoods:
        return None, float('-inf')
    
    # Sort by likelihood (highest first)
    likelihoods.sort(key=lambda x: x[1], reverse=True)
    
    # Return the most likely relationship
    return likelihoods[0]

This method implements a comprehensive search over the space of possible relationships, using several key steps:

Generate all possible relationship tuples up to a specified degree
Filter out biologically invalid relationships
Calculate the log-likelihood for each valid relationship
Sort relationships by likelihood and return the best one

The method balances comprehensiveness with computational efficiency, considering a wide range of relationships while avoiding an exhaustive search of all possible relationships (which would be computationally prohibitive for distant relationships).

Relationship Confidence and Ambiguity

In addition to finding the most likely relationship, PwLogLike includes methods for assessing confidence and detecting ambiguity in relationship inference. These are implemented in methods like get_relationship_confidence():

def get_relationship_confidence(self, id1, id2, relationship_tuple, max_degree=4):
    """Calculate confidence in a relationship hypothesis.
    
    Args:
        id1, id2: IDs of the individuals
        relationship_tuple: (up, down, num_ancs) tuple
        max_degree: Maximum relationship degree to consider
        
    Returns:
        Confidence score (0-1)
    """
    # Calculate likelihood of the specified relationship
    target_ll = self.get_log_like(id1, id2, relationship_tuple)
    
    # Find the most likely alternative relationship
    alternative_ll = float('-inf')
    
    # Generate all possible relationship tuples
    # [Similar to get_most_likely_rel, but excluding the target relationship]
    
    # Calculate likelihood ratio (Bayes factor)
    likelihood_ratio = np.exp(target_ll - alternative_ll)
    
    # Convert to confidence score
    confidence = likelihood_ratio / (1 + likelihood_ratio)
    
    return confidence

This method quantifies the strength of evidence for a relationship by comparing it to alternative hypotheses. High confidence scores indicate strong evidence, while low scores suggest ambiguity or uncertainty.

The system also includes methods for identifying ambiguous relationship groups—sets of relationships that are difficult to distinguish based on available evidence:

def get_ambiguous_relationships(self, id1, id2, log_ll_threshold=2.0):
    """Find relationships that cannot be confidently distinguished.
    
    Args:
        id1, id2: IDs of the individuals
        log_ll_threshold: Threshold for considering relationships ambiguous
        
    Returns:
        List of ambiguous relationship tuples
    """
    # Get all relationship likelihoods
    # [similar to get_most_likely_rel]
    
    # Sort by likelihood
    likelihoods.sort(key=lambda x: x[1], reverse=True)
    
    # Get the most likely relationship
    best_rel, best_ll = likelihoods[0]
    
    # Find relationships that are ambiguous (likelihood within threshold)
    ambiguous_rels = [rel_tuple for rel_tuple, ll in likelihoods 
                      if best_ll - ll <= log_ll_threshold]
    
    return ambiguous_rels

This approach to uncertainty quantification is crucial for reliable pedigree reconstruction, as it prevents the system from making overconfident inferences when the evidence is ambiguous.

Applications in Pedigree Construction

The PwLogLike class is used extensively throughout Bonsai v3's pedigree construction process:

Initial Relationship Assessment: At the start of pedigree construction, a PwLogLike instance is created to assess all pairwise relationships:

def build_pedigree(bio_info, unphased_ibd_seg_list, ...):
    """Main entry point for pedigree reconstruction.
    
    Args:
        bio_info: Biographical information
        unphased_ibd_seg_list: IBD segments
        ...
        
    Returns:
        Reconstructed pedigree
    """
    # Initialize PwLogLike
    pw_ll = PwLogLike(bio_info, unphased_ibd_seg_list, ...)
    
    # Use it for initial relationship assessment
    all_genotype_ids = [info['genotype_id'] for info in bio_info]
    all_relationships = {}
    
    for i in range(len(all_genotype_ids)):
        for j in range(i+1, len(all_genotype_ids)):
            id1 = all_genotype_ids[i]
            id2 = all_genotype_ids[j]
            
            # Get most likely relationship
            rel_tuple, log_ll = pw_ll.get_most_likely_rel(id1, id2)
            
            # Store if sufficiently likely
            if log_ll > THRESHOLD:
                all_relationships[(id1, id2)] = (rel_tuple, log_ll)
    
    # [Proceed with pedigree construction using these relationships]

Connection Point Evaluation: When merging pedigree fragments, PwLogLike is used to evaluate potential connection points:

def find_connection_points(pedigree1, pedigree2, pw_ll):
    """Find optimal points to connect two pedigrees.
    
    Args:
        pedigree1, pedigree2: Pedigrees to connect
        pw_ll: PwLogLike instance
        
    Returns:
        List of potential connection points with scores
    """
    # Get individuals in each pedigree
    ids1 = set(pedigree1.keys())
    ids2 = set(pedigree2.keys())
    
    # Evaluate all possible connections
    connections = []
    for id1 in ids1:
        for id2 in ids2:
            # Get relationship likelihood
            rel_tuple, log_ll = pw_ll.get_most_likely_rel(id1, id2)
            
            # Add to potential connections if sufficiently likely
            if log_ll > THRESHOLD:
                connections.append((id1, id2, rel_tuple, log_ll))
    
    # Sort by likelihood and return
    connections.sort(key=lambda x: x[3], reverse=True)
    return connections

Pedigree Validation: After constructing a pedigree, PwLogLike is used to validate the inferred relationships:

def validate_pedigree(pedigree, pw_ll):
    """Validate a pedigree against observed IBD data.
    
    Args:
        pedigree: Pedigree to validate
        pw_ll: PwLogLike instance
        
    Returns:
        Dictionary of validation results
    """
    # Extract all observed individuals
    observed_ids = set(pedigree.keys()) - {id for id in pedigree if id < 0}  # Exclude inferred individuals
    
    # Check each pair of observed individuals
    validation_results = {}
    for id1 in observed_ids:
        for id2 in observed_ids:
            if id1 >= id2:
                continue
                
            # Get relationship in pedigree
            pedigree_rel = pedigrees.get_simple_rel_tuple(pedigree, id1, id2)
            
            # Get most likely relationship from IBD
            inferred_rel, log_ll = pw_ll.get_most_likely_rel(id1, id2)
            
            # Check if they match
            is_consistent = pedigree_rel == inferred_rel
            
            # Store result
            validation_results[(id1, id2)] = {
                'pedigree_relationship': pedigree_rel,
                'inferred_relationship': inferred_rel,
                'log_likelihood': log_ll,
                'is_consistent': is_consistent
            }
    
    return validation_results

These examples illustrate how the PwLogLike class serves as a core component throughout the pedigree reconstruction process, providing the quantitative foundation for relationship inference at every stage.

Core Component: The PwLogLike class is the computational engine that powers Bonsai v3's relationship inference capabilities. By encapsulating sophisticated statistical models in a cohesive object-oriented design, it provides a flexible and powerful framework for integrating multiple sources of evidence to infer relationships between individuals, forming the foundation for robust pedigree reconstruction.

Comparing Notebook and Production Code

The Lab07 notebook provides a simplified exploration of the PwLogLike class, while the actual implementation in Bonsai v3 includes additional sophistication:

Caching Mechanisms: The production code includes sophisticated caching strategies to avoid redundant computation
Error Handling: Comprehensive handling of edge cases and error conditions
Relationship-Specific Models: Specialized handling for different relationship types
Calibrated Parameters: Empirically derived parameters for different populations
Performance Optimizations: Careful attention to computational efficiency
Population-Specific Adjustments: The ability to adjust parameters for different population backgrounds

The notebook provides a valuable introduction to the key concepts, but the production implementation represents years of development and refinement to handle the complexities of real-world genetic data and diverse human populations.

Interactive Lab Environment

Run the interactive Lab 07 notebook in Google Colab:

Google Colab Environment

Run the notebook in Google Colab for a powerful computing environment with access to Google's resources.

Data will be automatically downloaded from S3 when you run the notebook.

Note: You may need a Google account to save your work in Google Drive.

Open Lab 07 Notebook in Google Colab

Beyond the Code

As you explore the PwLogLike class, consider these broader implications:

Object-Oriented Design: How encapsulating data and computation in a cohesive class enables modular and maintainable code
Evidence Integration: The challenges and opportunities of combining multiple sources of evidence for inference
Statistical Modeling: How probabilistic models can handle the inherent uncertainty in biological systems
Bayesian Integration: The effectiveness of Bayesian approaches for combining prior knowledge with observed data

These considerations highlight how the PwLogLike class represents not just a technical solution but a principled approach to inference under uncertainty, with applications beyond genetic genealogy to other domains involving complex network inference.

This lab is part of the Bonsai v3 Deep Dive track:

Introduction

Lab 01

Architecture

Lab 02

IBD Formats

Lab 03

Statistics

Lab 04

Models

Lab 05

Relationships

Lab 06

PwLogLike

Lab 07

Age Modeling

Lab 08

Data Structures

Lab 09

Up-Node Dict

Lab 10