Computational Genetic Genealogy

Integration with Other Genealogical Tools

Lab 28: Integration with Other Genealogical Tools

Core Component: This lab explores how Bonsai v3 integrates with other genealogical tools and systems, particularly through the DRUID algorithm and other integration mechanisms. Understanding these integration capabilities is essential for creating comprehensive genetic genealogy workflows that leverage multiple data sources and analytical approaches.

Beyond Standalone Analysis

The Integration Imperative

While Bonsai v3 provides powerful genetic relationship inference capabilities, real-world genetic genealogy typically involves multiple tools and data sources. Effective integration with other systems enables more comprehensive analysis and better results:

Key Integration Benefits
  • Complementary Capabilities: Different tools excel at different aspects of genetic genealogy
  • Multiple Data Sources: Incorporating DNA, documentary, and contextual information
  • Workflow Continuity: Supporting end-to-end genetic genealogy processes
  • Expertise Leverage: Utilizing specialized algorithms from various domains
  • Ecosystem Compatibility: Fitting into existing user workflows and tool chains

Bonsai v3's integration capabilities enable it to function both as a standalone analysis tool and as a component within larger genetic genealogy workflows.

The Genetic Genealogy Ecosystem

Bonsai integrates with several categories of external tools and systems:

  • DNA Testing Platforms: Direct-to-consumer testing companies and research databases
  • Family Tree Systems: Genealogical record management tools
  • IBD Detection Tools: Specialized algorithms for identifying IBD segments
  • Population Genetics Software: Tools for analyzing population structure and admixture
  • Visualization Systems: Specialized tools for representing genetic relationships

The DRUID Algorithm

Degree Relationship Using IBD Data

One of Bonsai v3's key integration mechanisms is the DRUID (Degree Relationship Using IBD Data) algorithm, implemented in the druid.py module. This algorithm provides standardized relationship inference that can integrate with external systems:

DRUID Core Functionality

The DRUID algorithm uses a generalized approach to infer relationship degrees from IBD sharing data:

def infer_degree_generalized_druid(
    total_ibd: float,
    num_segments: int = None,
    longest_segment: float = None,
    total_full_ibd: float = None,
):
    """
    Infer relationship degree using the generalized DRUID algorithm.
    
    This algorithm estimates the degree of relationship based on
    total IBD sharing and optional segment characteristics.
    
    Args:
        total_ibd: Total IBD sharing in centiMorgans
        num_segments: Optional number of IBD segments
        longest_segment: Optional length of longest segment in cM
        total_full_ibd: Optional total fully identical region length
        
    Returns:
        Estimated relationship degree (1.0 = first degree, etc.)
    """
    # Implementation uses model-based prediction of relationship degree
    # based on IBD statistics, calibrated with known relationships

This function provides a standardized interface for relationship inference that external systems can easily incorporate, without needing to understand Bonsai's more complex internal mechanisms.

DRUID Integration Example
# Example of how an external tool might use DRUID
def analyze_match_with_druid(match_data):
    """
    Analyze a DNA match using the DRUID algorithm.
    
    Args:
        match_data: Dictionary with match statistics
        
    Returns:
        Dictionary with relationship prediction
    """
    # Extract IBD statistics from match data
    total_ibd = match_data['shared_cm']
    num_segments = match_data.get('num_segments')
    longest_segment = match_data.get('longest_segment')
    
    # Call DRUID algorithm
    degree = infer_degree_generalized_druid(
        total_ibd=total_ibd,
        num_segments=num_segments,
        longest_segment=longest_segment
    )
    
    # Convert degree to relationship description
    relationship = degree_to_relationship(degree)
    
    return {
        'predicted_degree': degree,
        'relationship_description': relationship,
        'confidence': calculate_confidence(total_ibd, degree)
    }
Key DRUID Advantages
  • Simplicity: Straightforward interface requiring minimal data
  • Standardization: Consistent relationship degree scale
  • Robustness: Works with varied input quality and completeness
  • Calibration: Empirically calibrated with known relationships
  • Extensibility: Can incorporate additional evidence when available

Data Exchange Formats

Standardized Information Transfer

Effective integration requires standardized data exchange formats. Bonsai v3 supports several key formats for importing and exporting genetic and relationship data:

IBD Data Formats

Bonsai supports several common IBD data formats:

Format Description Common Sources
Phased IBD Format Detailed segment data with phase information Research tools like IBIS, Refined-IBD
Unphased Segment Format Simpler format without phase information Consumer testing companies, GERMLINE
Summary Statistics Format Aggregated IBD metrics without segment details Consumer websites, limited data sharing
Match List Format Simple listing of genetic matches and basic metrics Consumer testing platforms, simple exports
Pedigree Data Formats

For exchanging pedigree information, Bonsai supports:

  • GEDCOM: Standard genealogical data exchange format
  • CSV Relationship Format: Simple tabular relationship data
  • JSON Pedigree Format: Hierarchical pedigree representation
  • Graph Exchange Format (GXF): Standard format for network structures
Format Conversion Example
# Example of converting between IBD formats
def convert_to_bonsai_format(external_segment_data, format_type):
    """
    Convert external IBD data to Bonsai's internal format.
    
    Args:
        external_segment_data: IBD data in external format
        format_type: String identifying the external format
        
    Returns:
        IBD segments in Bonsai's internal format
    """
    bonsai_segments = []
    
    if format_type == "23andme":
        # Convert 23andMe format
        for segment in external_segment_data:
            bonsai_segments.append({
                "chromosome": segment["chromosome"],
                "start_pos": int(segment["start_point"]),
                "end_pos": int(segment["end_point"]),
                "cm_length": float(segment["centimorgans"]),
                "snp_count": int(segment["snps"])
            })
    
    elif format_type == "ancestry":
        # Convert Ancestry.com format
        for segment in external_segment_data:
            bonsai_segments.append({
                "chromosome": segment["Chr"],
                "start_pos": int(segment["Start"]),
                "end_pos": int(segment["End"]),
                "cm_length": float(segment["cM"]),
                "snp_count": int(segment.get("SNPs", 0))
            })
    
    # More format conversions...
    
    return bonsai_segments
Data Transformation Challenges

Converting between different data formats presents several challenges:

  • Information Loss: Some formats contain less information than others
  • Coordinate Systems: Different genomic coordinate references
  • Identifier Mapping: Reconciling different individual identifiers
  • Quality Variations: Varying data quality and completeness

Bonsai's data exchange utilities include mechanisms to handle these challenges and maintain data integrity during format conversions.

API Integration

Programmatic Access and Control

Bonsai v3 provides several API mechanisms for programmatic integration with other systems:

Python API

Bonsai's primary API is its Python interface, which allows direct integration with other Python-based tools:

# Example of using Bonsai's Python API
from bonsai.v3 import PedigreeBuilder, IBDProcessor

# Initialize Bonsai components
ibd_processor = IBDProcessor()
pedigree_builder = PedigreeBuilder()

# Process IBD data
processed_ibd = ibd_processor.process_segments(raw_segments)

# Build pedigree from processed IBD
pedigree = pedigree_builder.build_from_ibd(processed_ibd)

# Export results in desired format
pedigree.export_to_gedcom("results.ged")

This API enables seamless integration with other Python-based genetic and genealogical tools, creating unified analysis workflows.

Command-Line Interface

For integration with non-Python systems, Bonsai provides a command-line interface:

# Example of command-line integration
$ bonsai-process --input segments.csv --format 23andme --output processed.json
$ bonsai-build --input processed.json --output pedigree.ged

This command-line interface enables easy integration with shell scripts, workflows, and other command-line tools.

Web API

For distributed or service-oriented architectures, Bonsai can be deployed with a REST API:

# Example of REST API access
POST /api/v1/process-ibd
{
  "segments": [...],
  "format": "23andme"
}

Response:
{
  "processed_data": [...],
  "statistics": {...}
}

POST /api/v1/build-pedigree
{
  "processed_data": [...],
  "parameters": {...}
}

Response:
{
  "pedigree": {...},
  "statistics": {...}
}

This web API enables integration with web applications, cloud-based services, and other distributed systems.

Integration with DNA Testing Platforms

Connecting with Commercial and Research Platforms

Bonsai v3 includes specific integration mechanisms for major DNA testing platforms, enabling direct data exchange and analysis coordination:

Supported Testing Platforms
  • 23andMe: Personal genome testing focused on health and ancestry
  • AncestryDNA: Genealogy-focused genetic testing service
  • Family Tree DNA: Service focusing on genetic genealogy and deep ancestry
  • MyHeritage DNA: Combined genetic testing and family tree service
  • LivingDNA: Testing with detailed geographic ancestry resolution
  • All of Us: NIH research program with genetic data
  • UK Biobank: Large-scale biomedical database and research resource
Integration Approaches

Bonsai supports several methods for integrating with these platforms:

  1. Data Import: Reading raw data files downloaded from testing platforms
  2. API Connections: Direct API integration where supported
  3. Format Conversion: Converting between platform-specific and standard formats
  4. Browser Extensions: Supporting data extraction from web interfaces
23andMe Integration Example
# Example of integrating with 23andMe data
def process_23andme_data(raw_data_file, matches_file):
    """
    Process 23andMe data files for Bonsai analysis.
    
    Args:
        raw_data_file: Path to 23andMe raw data file
        matches_file: Path to 23andMe matches CSV export
        
    Returns:
        Processed data ready for Bonsai analysis
    """
    # Load and parse raw genotype data
    genotypes = parse_23andme_raw_data(raw_data_file)
    
    # Load and parse matches data
    matches = parse_23andme_matches(matches_file)
    
    # Convert to Bonsai format
    bonsai_segments = []
    for match in matches:
        match_segments = extract_segments_from_match(match)
        bonsai_segments.extend(match_segments)
    
    # Process with Bonsai
    processed_data = ibd_processor.process_segments(bonsai_segments)
    
    return processed_data
Platform-Specific Considerations

Each testing platform has unique characteristics that affect integration:

  • Data Completeness: Some platforms provide more detailed data than others
  • Access Mechanisms: Varying API availability and data export options
  • Coordinate Systems: Different genomic build references
  • Privacy Controls: Platform-specific restrictions on data sharing

Bonsai's integration modules account for these differences to provide consistent analysis capabilities across platforms.

Integration with Family Tree Systems

Combining Genetic and Documentary Evidence

One of the most powerful aspects of genetic genealogy is the integration of genetic evidence with traditional family tree information. Bonsai v3 supports bidirectional integration with family tree systems:

Family Tree Import

Bonsai can import existing family tree data to:

  • Provide Context: Using known relationships to inform genetic analysis
  • Generate Hypotheses: Creating relationship hypotheses to test with genetic data
  • Pre-populate Pedigrees: Starting with documentary pedigrees and confirming/extending with genetic evidence
  • Identify Gaps: Finding areas where genetic evidence might resolve uncertainties
Family Tree Export

Bonsai can export its analysis results to family tree systems for:

  • Verification: Confirming documentary relationships with genetic evidence
  • Extension: Adding genetically discovered relationships to existing trees
  • Correction: Identifying and resolving contradictions between genetic and documentary evidence
  • Documentation: Recording confidence levels and evidence sources
GEDCOM Integration Example
# Example of integrating with GEDCOM family tree data
def integrate_gedcom_with_genetic_data(gedcom_file, genetic_data):
    """
    Integrate GEDCOM family tree with genetic data in Bonsai.
    
    Args:
        gedcom_file: Path to GEDCOM file
        genetic_data: Processed genetic data from Bonsai
        
    Returns:
        Integrated pedigree with both documentary and genetic evidence
    """
    # Parse GEDCOM file
    gedcom_pedigree = parse_gedcom(gedcom_file)
    
    # Convert to Bonsai pedigree format
    documentary_pedigree = convert_to_bonsai_pedigree(gedcom_pedigree)
    
    # Create genetic pedigree
    genetic_pedigree = pedigree_builder.build_from_data(genetic_data)
    
    # Integrate the pedigrees
    integrated_pedigree = pedigree_integrator.integrate_pedigrees(
        documentary_pedigree,
        genetic_pedigree,
        conflict_resolution="genetic_priority"
    )
    
    # Annotate with confidence information
    annotated_pedigree = confidence_annotator.annotate_pedigree(
        integrated_pedigree,
        genetic_data
    )
    
    return annotated_pedigree
Supported Family Tree Systems

Bonsai can integrate with various family tree systems:

  • Desktop Software: Programs like Family Tree Maker, RootsMagic, Legacy
  • Online Services: Platforms like Ancestry, MyHeritage, FamilySearch
  • Open Source Systems: Tools like Gramps, webtrees
  • Research Databases: Specialized academic and professional systems

Integration with IBD Detection Tools

Leveraging Specialized Detection Algorithms

Bonsai focuses on relationship inference from IBD data, but often relies on specialized external tools for the initial IBD detection. Bonsai v3 includes integration mechanisms for several IBD detection tools:

Supported IBD Detection Tools
  • GERMLINE: Fast IBD detection for large datasets
  • Refined-IBD: High-precision IBD detection
  • IBDseq: IBD detection for sequencing data
  • IBIS: Identity-by-descent imputation system
  • hap-IBD: Haplotype-based IBD detection
  • iLASH: IBD detection for biobank-scale data
Integration Workflow
  1. Input Preparation: Formatting genetic data for IBD detection tools
  2. Tool Execution: Running the detection algorithm (directly or via wrappers)
  3. Result Processing: Converting detection results to Bonsai's internal format
  4. Quality Assessment: Evaluating the reliability of detected segments
  5. Normalization: Adjusting for tool-specific biases and characteristics
Refined-IBD Integration Example
# Example of integrating with Refined-IBD
def process_with_refined_ibd(vcf_file, map_file):
    """
    Process VCF data with Refined-IBD and integrate with Bonsai.
    
    Args:
        vcf_file: Path to VCF file with genetic data
        map_file: Path to genetic map file
        
    Returns:
        Processed IBD segments ready for Bonsai analysis
    """
    # Prepare Refined-IBD input
    refined_ibd_input = prepare_refined_ibd_input(vcf_file, map_file)
    
    # Run Refined-IBD (external process)
    refined_ibd_output = run_refined_ibd(refined_ibd_input)
    
    # Parse Refined-IBD output
    detected_segments = parse_refined_ibd_output(refined_ibd_output)
    
    # Convert to Bonsai format
    bonsai_segments = convert_to_bonsai_format(detected_segments, "refined-ibd")
    
    # Process with Bonsai
    processed_segments = ibd_processor.process_segments(bonsai_segments)
    
    return processed_segments
Tool Selection Considerations

Different IBD detection tools have different strengths and limitations:

Tool Strengths Limitations Best For
GERMLINE Speed, scalability Lower precision Large datasets, initial screening
Refined-IBD Accuracy, modeled error rates Computational intensity High-precision requirements
hap-IBD Robust to phasing errors Complex parameters Datasets with phasing challenges
IBIS Works with unphased data Limited to shorter segments Unphased consumer data

Bonsai's integration modules account for these differences and can adjust its analysis approach based on the IBD detection tool used.

Creating Integrated Workflows

End-to-End Genetic Genealogy Processes

By combining Bonsai v3 with other tools and systems, researchers can create comprehensive genetic genealogy workflows tailored to specific research questions and contexts:

Example Workflow: Unknown Parentage Case
  1. Data Collection: Testing with multiple platforms for maximum match coverage
  2. IBD Detection: Using specialized tools to identify shared DNA segments
  3. Relationship Inference: Using Bonsai to predict relationships from IBD patterns
  4. Match Clustering: Grouping matches by likely family branches
  5. Tree Building: Constructing partial family trees for each cluster
  6. Common Ancestor Identification: Finding connecting points between trees
  7. Hypothesis Validation: Using documentary research to verify predictions
Example Workflow: Population Study
  1. Sample Collection: Gathering genetic data from the population of interest
  2. Admixture Analysis: Using population genetics tools to assess ancestry
  3. IBD Detection: Identifying shared segments within the population
  4. Relationship Network Construction: Using Bonsai to build a comprehensive relationship network
  5. Historical Context Integration: Incorporating documentary and demographic information
  6. Network Analysis: Applying social network analysis to the relationship structure
  7. Visualization and Reporting: Presenting the findings with appropriate visualizations
Integration Pipeline Example
# Example of a complete integration pipeline
def run_integrated_workflow(raw_data_files, known_relationships=None):
    """
    Run a complete integrated genetic genealogy workflow.
    
    Args:
        raw_data_files: List of paths to raw genetic data files
        known_relationships: Optional dict of known family relationships
        
    Returns:
        Complete analysis results
    """
    # Phase 1: Data preparation
    processed_data = []
    for file_path in raw_data_files:
        file_format = detect_file_format(file_path)
        processed_file = process_raw_data(file_path, file_format)
        processed_data.append(processed_file)
    
    # Phase 2: IBD detection (using appropriate external tool)
    ibd_segments = detect_ibd_segments(processed_data)
    
    # Phase 3: Relationship inference with Bonsai
    relationship_predictions = bonsai_analyzer.infer_relationships(ibd_segments)
    
    # Phase 4: Family tree integration
    if known_relationships:
        integrated_pedigree = integrate_with_known_relationships(
            relationship_predictions, 
            known_relationships
        )
    else:
        integrated_pedigree = build_pedigree_from_predictions(relationship_predictions)
    
    # Phase 5: Visualization and reporting
    visualizations = generate_visualizations(integrated_pedigree)
    report = generate_analysis_report(integrated_pedigree, relationship_predictions)
    
    return {
        "pedigree": integrated_pedigree,
        "relationships": relationship_predictions,
        "visualizations": visualizations,
        "report": report
    }

Conclusion and Next Steps

Bonsai v3's integration capabilities enable it to function as a key component in comprehensive genetic genealogy workflows, connecting with DNA testing platforms, family tree systems, IBD detection tools, and other specialized resources. Through mechanisms like the DRUID algorithm, standardized data exchange formats, and flexible APIs, Bonsai can adapt to diverse research contexts and leverage complementary tools to enhance its relationship inference capabilities.

By understanding and utilizing these integration mechanisms, researchers can create powerful, customized workflows that combine the strengths of multiple tools and data sources to address complex genetic genealogy challenges.

In the next lab, we'll explore how to implement end-to-end pedigree reconstruction pipelines using Bonsai v3, integrating all the components we've studied throughout this course.

Interactive Lab Environment

Run the interactive Lab 28 notebook in Google Colab:

Google Colab Environment

Run the notebook in Google Colab for a powerful computing environment with access to Google's resources.

Data will be automatically downloaded from S3 when you run the notebook.

Note: You may need a Google account to save your work in Google Drive.

Open Lab 28 Notebook in Google Colab

This lab is part of the Visualization & Advanced Applications track:

Rendering

Lab 21

Interpreting

Lab 22

Twins

Lab 23

Complex

Lab 24

Real-World

Lab 25

Performance

Lab 26

Prior Models

Lab 27

Integration

Lab 28

End-to-End

Lab 29

Advanced

Lab 30