Lab 23: Handling Twins and Close Relatives
Core Component: This lab explores the specialized algorithms in Bonsai v3's twins.py
module for handling twins and other very close relatives, which present unique challenges for pedigree reconstruction. Understanding these special cases is essential for accurate family structure inference.
The Twin Challenge in Genetic Genealogy
Why Twins Require Special Handling
Twins create unique challenges for genetic genealogy algorithms due to their special genetic relationships:
Types of Twins and Their Genetic Signatures
- Monozygotic (Identical) Twins: Develop from a single fertilized egg that splits into two embryos
- Share virtually 100% of their genetic material
- Appear genetically indistinguishable using standard DNA tests
- Same sex (both male or both female)
- Dizygotic (Fraternal) Twins: Develop from two separately fertilized eggs
- Share approximately 50% of their DNA (like regular siblings)
- Can be different sexes
- Genetically similar to full siblings but born simultaneously
These unique genetic relationships create several challenges for pedigree reconstruction algorithms:
- Identical Genetic Profiles: Standard algorithms may be unable to distinguish between identical twins
- Relationship Ambiguity: The extremely high IBD sharing between identical twins can be confused with duplicate samples
- Pedigree Placement: Algorithms must ensure both twins are consistently placed in family structures
- Downstream Relationships: Children of twins have special genetic relationships that standard models don't capture
Impact on Pedigree Reconstruction
Without specialized handling for twins, pedigree reconstruction algorithms may:
- Misidentify identical twins as the same person
- Create inconsistent relationship assignments for twin pairs
- Incorrectly model relationships involving children of twins
- Generate pedigree structures that violate biological constraints
The twins.py Module in Bonsai v3
Specialized Twin Handling
Bonsai v3 includes a dedicated twins.py
module that implements specialized algorithms for detecting and handling twin relationships:
from .constants import TWIN_THRESHOLD def is_twin_pair( total_half: float, total_full: float, age1: int, age2: int, sex1: str, sex2: str, ): """ Determine if a pair of individuals are twins. Args: total_half: The total length of half segments shared by the two individuals. total_full: The total length of full segments shared by the two individuals. age1: The age of the first individual. age2: The age of the second individual. sex1: Sex of the first individual. sex2: Sex of the second individual. Returns: True if the individuals are twins, False otherwise. """ # handle unrelated people if total_half is None: return False if total_full is None: return False if total_half < TWIN_THRESHOLD: return False elif total_full < TWIN_THRESHOLD: return False elif sex1 != sex2: return False elif age1 and age2 and age1 != age2: return False return True
This module also provides functions for grouping twins into sets and integrating twin information into the pedigree reconstruction process:
def get_twin_sets( ibd_stat_dict: dict[frozenset, dict[str, int]], age_dict: dict[int, float], sex_dict: dict[int, str], ): """ Find all sets of twins. Args: ibd_stat_dict: A dictionary of IBD statistics. age_dict: A dictionary mapping ID to age sex_dict: A dictionary mapping ID to sex ('M' or 'F') Returns: idx_to_twin_set: A dictionary mapping an index to a set of node IDs that form a twin set. id_to_idx: A dict mapping each twin ID to its index. """ # Implementation details...
Key Features of the Module
- Twin Detection: Uses genetic and demographic criteria to identify potential twin pairs
- Twin Set Management: Groups twins into sets and maintains their relationships
- Integration with Pedigree Building: Ensures consistent placement of twins in pedigree structures
- Configuration Options: Provides tunable thresholds for twin identification
TWIN_THRESHOLD Constant
The TWIN_THRESHOLD
constant in Bonsai v3 defines the minimum amount of shared DNA (in centimorgans) required to consider two individuals as potential twins. This threshold is typically set very high (e.g., 2800 cM) to avoid false positives while capturing true twin relationships.
Twin Detection Algorithms
Identifying Twin Relationships
Bonsai v3 uses several criteria to identify potential twin relationships:
1. Genetic Criteria
- Extremely High IBD Sharing: Twins share exceptional amounts of DNA
- Identical twins: Nearly 100% (all chromosomes)
- Fraternal twins: ~50% (similar to regular siblings)
- IBD Pattern Analysis: Examining the pattern of half-identical and fully-identical regions
- Identical twins show extensive fully-identical regions
- Fraternal twins show patterns similar to full siblings
2. Demographic Criteria
- Identical Age: Twins are typically born on the same day
- When age data is available, twins should have the same birth year
- Small discrepancies might exist in reported ages
- Sex Compatibility: Sex information provides additional constraints
- Identical twins must be the same sex
- Fraternal twins can be the same or different sexes
Distinguishing Twins from Parent-Child
One of the key challenges in twin detection is distinguishing identical twins from parent-child relationships, as both can exhibit very high IBD sharing. Bonsai uses several strategies:
- Age Differences: Parents and children typically have significant age gaps
- IBD Pattern Analysis: Parent-child relationships show half-identical regions but not fully-identical regions
- Consistency Checks: Parent-child relationships must fit into a consistent generational structure
Implementation in is_twin_pair
Function
The is_twin_pair
function implements these criteria, evaluating:
- Total half-identical IBD sharing against the TWIN_THRESHOLD
- Total fully-identical IBD sharing against the TWIN_THRESHOLD
- Sex compatibility (requiring matching sex)
- Age compatibility (requiring matching age if available)
Only if all these criteria are met does Bonsai classify a pair as twins.
Twin Placement in Pedigrees
Ensuring Consistent Treatment of Twins
Once twins are identified, Bonsai must place them correctly in pedigree structures, maintaining biological consistency:
Key Principles of Twin Placement
- Common Parentage: Twins must share the same parents in the pedigree
- Generational Consistency: Twins must be placed in the same generation
- Relationship Consistency: Twins must have consistent relationships with all other individuals
Bonsai enforces these principles through specialized pedigree construction logic when twins are present:
Twin Placement Algorithm
# Pseudocode for twin-aware pedigree construction for each twin_set in all_twin_sets: # Identify the best placement for one twin in the set primary_twin = select_representative_twin(twin_set) best_parents = identify_best_parents(primary_twin, pedigree, ibd_data) # Place all twins in the set with the same parents for twin in twin_set: add_individual_to_pedigree(twin, parents=best_parents) # Ensure consistent relationships with others for relative in all_individuals: if is_related(primary_twin, relative): relationship = get_relationship(primary_twin, relative) # Apply same relationship to all twins in the set for twin in twin_set: ensure_relationship(twin, relative, relationship)
Implementation Challenges
Several practical challenges arise when implementing twin placement:
- Evidence Conflicts: Different twins may have slightly different IBD patterns with other relatives due to testing or analysis variation
- Priority Decisions: When evidence conflicts, the algorithm must decide which relationships to prioritize
- Computational Efficiency: Handling twins can increase the complexity of pedigree construction algorithms
Bonsai addresses these challenges through a combination of evidence pooling, confidence-weighted decision making, and specialized optimization strategies.
Other Very Close Relatives
Beyond Simple Twin Relationships
Twins are just one example of special genetic relationships that require specialized handling. Bonsai v3 also addresses several other scenarios involving very close relatives:
1. Double First Cousins
Double first cousins occur when two siblings from one family marry two siblings from another family. Their children are related through both parents:
- Share approximately 25% of their DNA (similar to half-siblings)
- Can be confused with half-siblings in genetic analysis
- Require pedigree context to correctly identify
2. Parent-Child Incest Cases
Children resulting from parent-child relationships have unusual genetic patterns:
- Share 75% of their DNA with the parent who is also their grandparent
- Have elevated homozygosity due to consanguinity
- Present special ethical considerations in reporting
3. Compound Relationships
Individuals can be related through multiple pathways, creating complex genetic signatures:
- Half-siblings who are also first cousins
- Double cousins through multiple family connections
- Relationships in endogamous communities with multiple connections
Endogamy and Its Impact
Endogamy refers to the practice of marrying within a relatively closed community, resulting in elevated background relatedness. In endogamous populations:
- Individuals share more DNA than expected for their genealogical relationship
- Standard relationship prediction models can overestimate closeness
- Pedigree reconstruction is more complex due to multiple paths of relationship
Bonsai v3 includes specialized adjustments to account for these effects when endogamy is detected.
Special Handling Implementation
Bonsai implements several strategies to handle these complex cases:
- Multi-Pathway Analysis: Evaluating how multiple relationship paths contribute to observed IBD
- Pattern Recognition: Identifying characteristic patterns of complex relationships
- Model Adjustment: Modifying likelihood models to account for unusual sharing patterns
- Consistency Enforcement: Ensuring pedigree structures maintain biological consistency
The Identical Twin Problem
Fundamental Limitations of Genetic Differentiation
Identical twins present a fundamental challenge for genetic genealogy because standard DNA tests cannot reliably distinguish between them. This creates what we call "The Identical Twin Problem":
Key Aspects of the Problem
- Genetic Indistinguishability: Standard autosomal DNA tests show effectively identical results
- Algorithmic Ambiguity: Algorithms cannot determine which twin is which based solely on genetic data
- Downstream Propagation: Ambiguity affects relationships with all descendants
Strategies for Handling Identical Twins
While perfect genetic differentiation is generally not possible with standard tests, several approaches can help manage the identical twin problem:
- Incorporating Non-Genetic Information
- Birth order and birth dates
- Names and other identifying information
- Known relationships from documentary sources
- Special Notation and Visualization
- Explicit twin labeling in pedigrees
- Visual elements showing potential ambiguity
- Alternative pedigree representations
- Advanced Genetic Techniques (beyond standard tests)
- Rare somatic mutations that differ between twins
- Epigenetic markers that diverge over lifetime
- Specialized testing targeting post-zygotic mutations
Fallback Strategy in Bonsai
When Bonsai identifies identical twins but cannot distinguish between them, it implements a fallback strategy:
- Maintain both twins in the pedigree structure
- Apply identical relationship predictions to both twins
- Flag the twins as indistinguishable in visualization and reporting
- Allow user input to resolve ambiguity based on non-genetic information
Twin-Specific Visualization and Reporting
Communicating Twin Relationships
Effective visualization and reporting of twin relationships is critical for user understanding. Bonsai implements several specialized approaches:
1. Visual Representation of Twins
- Node Styling: Special visual markers for twin nodes in pedigree diagrams
- Connection Styling: Special edges connecting twins to each other
- Color Coding: Visual distinction between identical and fraternal twins
2. Confidence Reporting
- Twin Confidence Metrics: Quantitative measures of confidence in twin identification
- Explanation Codes: Notation explaining the evidence supporting twin classification
- Alternative Relationship Scores: Showing scores for competing hypotheses
3. Communicating Ambiguity
- Explicit Uncertainty Notation: Clear indication when twins cannot be distinguished
- Alternative Placement Visualization: Showing multiple possible pedigree configurations
- Confidence Intervals: Visualizing confidence in twin-related predictions
Improving User Understanding
Effective visualization of twin relationships should:
- Make twin relationships immediately apparent in pedigree diagrams
- Clearly distinguish between identical and fraternal twins
- Help users understand downstream implications of twin relationships
- Communicate confidence and potential ambiguity
- Provide options for resolving ambiguity with additional information
Handling Twin Children
Special Genetic Relationships Through Twin Parents
Children of twins have unique genetic relationships that require special modeling:
Children of Identical Twins
- Genetic Half-Siblings: Children of identical twins are genetically equivalent to half-siblings, even though they are legally cousins
- Expected Sharing: ~25% of their DNA (similar to half-siblings)
- Pedigree Representation: Requires specialized notation to accurately represent the genetic relationship
Children of Fraternal Twins
- Enhanced First Cousins: Genetically similar to standard first cousins, but may have slightly different age patterns
- Expected Sharing: ~12.5% of their DNA (typical for first cousins)
- Age Considerations: Often much closer in age than typical first cousins
Implementation Approaches
Bonsai implements several strategies to correctly model relationships involving children of twins:
- Twin-Aware Relationship Inference: Adjusting relationship likelihoods based on twin status of parents
- Special Relationship Categories: Defining specialized categories for twin-specific relationships
- Modified Age Models: Adjusting age-based constraints for relationships through twin parents
- Consistency Enforcement: Ensuring all relationships through twins maintain biological consistency
Modeling Complex Twin Relationships
# Pseudocode for handling children of twins def infer_relationship_with_twin_awareness(id1, id2, ibd_data, twin_sets): # Check if either individual is a child of a twin parent1 = get_parents(id1) parent2 = get_parents(id2) for twin_set in twin_sets: # If parents are identical twins, adjust relationship model if parent1 in twin_set and parent2 in twin_set: # These individuals are genetic half-siblings through identical twin parents return { "legal_relationship": "first_cousins", "genetic_relationship": "half_siblings", "expected_sharing": 0.25, "notes": "Parents are identical twins" } # Other twin-specific relationship checks... # If no twin-specific condition applies, use standard inference return standard_relationship_inference(id1, id2, ibd_data)
Conclusion and Next Steps
Handling twins and other very close relatives is a crucial aspect of computational genetic genealogy. Bonsai v3's specialized algorithms in the twins.py
module provide robust mechanisms for detecting twin relationships, ensuring consistent pedigree placement, and correctly modeling complex genetic relationships involving twins.
By understanding the unique challenges posed by twins and implementing tailored solutions, Bonsai can create more accurate and biologically consistent pedigree reconstructions, even in the presence of these special cases.
In the next lab, we'll explore how Bonsai v3 handles complex relationship patterns through specialized logic in the relationships.py
module, building on the foundation of twin handling to address an even broader range of special relationship scenarios.
Your Learning Pathway
Interactive Lab Environment
Run the interactive Lab 23 notebook in Google Colab:
Google Colab Environment
Run the notebook in Google Colab for a powerful computing environment with access to Google's resources.
Data will be automatically downloaded from S3 when you run the notebook.
Note: You may need a Google account to save your work in Google Drive.