Lab 28: Integration with Other Genealogical Tools
Core Component: This lab explores how Bonsai v3 integrates with other genealogical tools and systems, particularly through the DRUID algorithm and other integration mechanisms. Understanding these integration capabilities is essential for creating comprehensive genetic genealogy workflows that leverage multiple data sources and analytical approaches.
Beyond Standalone Analysis
The Integration Imperative
While Bonsai v3 provides powerful genetic relationship inference capabilities, real-world genetic genealogy typically involves multiple tools and data sources. Effective integration with other systems enables more comprehensive analysis and better results:
Key Integration Benefits
- Complementary Capabilities: Different tools excel at different aspects of genetic genealogy
- Multiple Data Sources: Incorporating DNA, documentary, and contextual information
- Workflow Continuity: Supporting end-to-end genetic genealogy processes
- Expertise Leverage: Utilizing specialized algorithms from various domains
- Ecosystem Compatibility: Fitting into existing user workflows and tool chains
Bonsai v3's integration capabilities enable it to function both as a standalone analysis tool and as a component within larger genetic genealogy workflows.
The Genetic Genealogy Ecosystem
Bonsai integrates with several categories of external tools and systems:
- DNA Testing Platforms: Direct-to-consumer testing companies and research databases
- Family Tree Systems: Genealogical record management tools
- IBD Detection Tools: Specialized algorithms for identifying IBD segments
- Population Genetics Software: Tools for analyzing population structure and admixture
- Visualization Systems: Specialized tools for representing genetic relationships
The DRUID Algorithm
Degree Relationship Using IBD Data
One of Bonsai v3's key integration mechanisms is the DRUID (Degree Relationship Using IBD Data) algorithm, implemented in the druid.py
module. This algorithm provides standardized relationship inference that can integrate with external systems:
DRUID Core Functionality
The DRUID algorithm uses a generalized approach to infer relationship degrees from IBD sharing data:
def infer_degree_generalized_druid( total_ibd: float, num_segments: int = None, longest_segment: float = None, total_full_ibd: float = None, ): """ Infer relationship degree using the generalized DRUID algorithm. This algorithm estimates the degree of relationship based on total IBD sharing and optional segment characteristics. Args: total_ibd: Total IBD sharing in centiMorgans num_segments: Optional number of IBD segments longest_segment: Optional length of longest segment in cM total_full_ibd: Optional total fully identical region length Returns: Estimated relationship degree (1.0 = first degree, etc.) """ # Implementation uses model-based prediction of relationship degree # based on IBD statistics, calibrated with known relationships
This function provides a standardized interface for relationship inference that external systems can easily incorporate, without needing to understand Bonsai's more complex internal mechanisms.
DRUID Integration Example
# Example of how an external tool might use DRUID def analyze_match_with_druid(match_data): """ Analyze a DNA match using the DRUID algorithm. Args: match_data: Dictionary with match statistics Returns: Dictionary with relationship prediction """ # Extract IBD statistics from match data total_ibd = match_data['shared_cm'] num_segments = match_data.get('num_segments') longest_segment = match_data.get('longest_segment') # Call DRUID algorithm degree = infer_degree_generalized_druid( total_ibd=total_ibd, num_segments=num_segments, longest_segment=longest_segment ) # Convert degree to relationship description relationship = degree_to_relationship(degree) return { 'predicted_degree': degree, 'relationship_description': relationship, 'confidence': calculate_confidence(total_ibd, degree) }
Key DRUID Advantages
- Simplicity: Straightforward interface requiring minimal data
- Standardization: Consistent relationship degree scale
- Robustness: Works with varied input quality and completeness
- Calibration: Empirically calibrated with known relationships
- Extensibility: Can incorporate additional evidence when available
Data Exchange Formats
Standardized Information Transfer
Effective integration requires standardized data exchange formats. Bonsai v3 supports several key formats for importing and exporting genetic and relationship data:
IBD Data Formats
Bonsai supports several common IBD data formats:
Format | Description | Common Sources |
---|---|---|
Phased IBD Format | Detailed segment data with phase information | Research tools like IBIS, Refined-IBD |
Unphased Segment Format | Simpler format without phase information | Consumer testing companies, GERMLINE |
Summary Statistics Format | Aggregated IBD metrics without segment details | Consumer websites, limited data sharing |
Match List Format | Simple listing of genetic matches and basic metrics | Consumer testing platforms, simple exports |
Pedigree Data Formats
For exchanging pedigree information, Bonsai supports:
- GEDCOM: Standard genealogical data exchange format
- CSV Relationship Format: Simple tabular relationship data
- JSON Pedigree Format: Hierarchical pedigree representation
- Graph Exchange Format (GXF): Standard format for network structures
Format Conversion Example
# Example of converting between IBD formats def convert_to_bonsai_format(external_segment_data, format_type): """ Convert external IBD data to Bonsai's internal format. Args: external_segment_data: IBD data in external format format_type: String identifying the external format Returns: IBD segments in Bonsai's internal format """ bonsai_segments = [] if format_type == "23andme": # Convert 23andMe format for segment in external_segment_data: bonsai_segments.append({ "chromosome": segment["chromosome"], "start_pos": int(segment["start_point"]), "end_pos": int(segment["end_point"]), "cm_length": float(segment["centimorgans"]), "snp_count": int(segment["snps"]) }) elif format_type == "ancestry": # Convert Ancestry.com format for segment in external_segment_data: bonsai_segments.append({ "chromosome": segment["Chr"], "start_pos": int(segment["Start"]), "end_pos": int(segment["End"]), "cm_length": float(segment["cM"]), "snp_count": int(segment.get("SNPs", 0)) }) # More format conversions... return bonsai_segments
Data Transformation Challenges
Converting between different data formats presents several challenges:
- Information Loss: Some formats contain less information than others
- Coordinate Systems: Different genomic coordinate references
- Identifier Mapping: Reconciling different individual identifiers
- Quality Variations: Varying data quality and completeness
Bonsai's data exchange utilities include mechanisms to handle these challenges and maintain data integrity during format conversions.
API Integration
Programmatic Access and Control
Bonsai v3 provides several API mechanisms for programmatic integration with other systems:
Python API
Bonsai's primary API is its Python interface, which allows direct integration with other Python-based tools:
# Example of using Bonsai's Python API from bonsai.v3 import PedigreeBuilder, IBDProcessor # Initialize Bonsai components ibd_processor = IBDProcessor() pedigree_builder = PedigreeBuilder() # Process IBD data processed_ibd = ibd_processor.process_segments(raw_segments) # Build pedigree from processed IBD pedigree = pedigree_builder.build_from_ibd(processed_ibd) # Export results in desired format pedigree.export_to_gedcom("results.ged")
This API enables seamless integration with other Python-based genetic and genealogical tools, creating unified analysis workflows.
Command-Line Interface
For integration with non-Python systems, Bonsai provides a command-line interface:
# Example of command-line integration $ bonsai-process --input segments.csv --format 23andme --output processed.json $ bonsai-build --input processed.json --output pedigree.ged
This command-line interface enables easy integration with shell scripts, workflows, and other command-line tools.
Web API
For distributed or service-oriented architectures, Bonsai can be deployed with a REST API:
# Example of REST API access POST /api/v1/process-ibd { "segments": [...], "format": "23andme" } Response: { "processed_data": [...], "statistics": {...} } POST /api/v1/build-pedigree { "processed_data": [...], "parameters": {...} } Response: { "pedigree": {...}, "statistics": {...} }
This web API enables integration with web applications, cloud-based services, and other distributed systems.
Integration with DNA Testing Platforms
Connecting with Commercial and Research Platforms
Bonsai v3 includes specific integration mechanisms for major DNA testing platforms, enabling direct data exchange and analysis coordination:
Supported Testing Platforms
- 23andMe: Personal genome testing focused on health and ancestry
- AncestryDNA: Genealogy-focused genetic testing service
- Family Tree DNA: Service focusing on genetic genealogy and deep ancestry
- MyHeritage DNA: Combined genetic testing and family tree service
- LivingDNA: Testing with detailed geographic ancestry resolution
- All of Us: NIH research program with genetic data
- UK Biobank: Large-scale biomedical database and research resource
Integration Approaches
Bonsai supports several methods for integrating with these platforms:
- Data Import: Reading raw data files downloaded from testing platforms
- API Connections: Direct API integration where supported
- Format Conversion: Converting between platform-specific and standard formats
- Browser Extensions: Supporting data extraction from web interfaces
23andMe Integration Example
# Example of integrating with 23andMe data def process_23andme_data(raw_data_file, matches_file): """ Process 23andMe data files for Bonsai analysis. Args: raw_data_file: Path to 23andMe raw data file matches_file: Path to 23andMe matches CSV export Returns: Processed data ready for Bonsai analysis """ # Load and parse raw genotype data genotypes = parse_23andme_raw_data(raw_data_file) # Load and parse matches data matches = parse_23andme_matches(matches_file) # Convert to Bonsai format bonsai_segments = [] for match in matches: match_segments = extract_segments_from_match(match) bonsai_segments.extend(match_segments) # Process with Bonsai processed_data = ibd_processor.process_segments(bonsai_segments) return processed_data
Platform-Specific Considerations
Each testing platform has unique characteristics that affect integration:
- Data Completeness: Some platforms provide more detailed data than others
- Access Mechanisms: Varying API availability and data export options
- Coordinate Systems: Different genomic build references
- Privacy Controls: Platform-specific restrictions on data sharing
Bonsai's integration modules account for these differences to provide consistent analysis capabilities across platforms.
Integration with Family Tree Systems
Combining Genetic and Documentary Evidence
One of the most powerful aspects of genetic genealogy is the integration of genetic evidence with traditional family tree information. Bonsai v3 supports bidirectional integration with family tree systems:
Family Tree Import
Bonsai can import existing family tree data to:
- Provide Context: Using known relationships to inform genetic analysis
- Generate Hypotheses: Creating relationship hypotheses to test with genetic data
- Pre-populate Pedigrees: Starting with documentary pedigrees and confirming/extending with genetic evidence
- Identify Gaps: Finding areas where genetic evidence might resolve uncertainties
Family Tree Export
Bonsai can export its analysis results to family tree systems for:
- Verification: Confirming documentary relationships with genetic evidence
- Extension: Adding genetically discovered relationships to existing trees
- Correction: Identifying and resolving contradictions between genetic and documentary evidence
- Documentation: Recording confidence levels and evidence sources
GEDCOM Integration Example
# Example of integrating with GEDCOM family tree data def integrate_gedcom_with_genetic_data(gedcom_file, genetic_data): """ Integrate GEDCOM family tree with genetic data in Bonsai. Args: gedcom_file: Path to GEDCOM file genetic_data: Processed genetic data from Bonsai Returns: Integrated pedigree with both documentary and genetic evidence """ # Parse GEDCOM file gedcom_pedigree = parse_gedcom(gedcom_file) # Convert to Bonsai pedigree format documentary_pedigree = convert_to_bonsai_pedigree(gedcom_pedigree) # Create genetic pedigree genetic_pedigree = pedigree_builder.build_from_data(genetic_data) # Integrate the pedigrees integrated_pedigree = pedigree_integrator.integrate_pedigrees( documentary_pedigree, genetic_pedigree, conflict_resolution="genetic_priority" ) # Annotate with confidence information annotated_pedigree = confidence_annotator.annotate_pedigree( integrated_pedigree, genetic_data ) return annotated_pedigree
Supported Family Tree Systems
Bonsai can integrate with various family tree systems:
- Desktop Software: Programs like Family Tree Maker, RootsMagic, Legacy
- Online Services: Platforms like Ancestry, MyHeritage, FamilySearch
- Open Source Systems: Tools like Gramps, webtrees
- Research Databases: Specialized academic and professional systems
Integration with IBD Detection Tools
Leveraging Specialized Detection Algorithms
Bonsai focuses on relationship inference from IBD data, but often relies on specialized external tools for the initial IBD detection. Bonsai v3 includes integration mechanisms for several IBD detection tools:
Supported IBD Detection Tools
- GERMLINE: Fast IBD detection for large datasets
- Refined-IBD: High-precision IBD detection
- IBDseq: IBD detection for sequencing data
- IBIS: Identity-by-descent imputation system
- hap-IBD: Haplotype-based IBD detection
- iLASH: IBD detection for biobank-scale data
Integration Workflow
- Input Preparation: Formatting genetic data for IBD detection tools
- Tool Execution: Running the detection algorithm (directly or via wrappers)
- Result Processing: Converting detection results to Bonsai's internal format
- Quality Assessment: Evaluating the reliability of detected segments
- Normalization: Adjusting for tool-specific biases and characteristics
Refined-IBD Integration Example
# Example of integrating with Refined-IBD def process_with_refined_ibd(vcf_file, map_file): """ Process VCF data with Refined-IBD and integrate with Bonsai. Args: vcf_file: Path to VCF file with genetic data map_file: Path to genetic map file Returns: Processed IBD segments ready for Bonsai analysis """ # Prepare Refined-IBD input refined_ibd_input = prepare_refined_ibd_input(vcf_file, map_file) # Run Refined-IBD (external process) refined_ibd_output = run_refined_ibd(refined_ibd_input) # Parse Refined-IBD output detected_segments = parse_refined_ibd_output(refined_ibd_output) # Convert to Bonsai format bonsai_segments = convert_to_bonsai_format(detected_segments, "refined-ibd") # Process with Bonsai processed_segments = ibd_processor.process_segments(bonsai_segments) return processed_segments
Tool Selection Considerations
Different IBD detection tools have different strengths and limitations:
Tool | Strengths | Limitations | Best For |
---|---|---|---|
GERMLINE | Speed, scalability | Lower precision | Large datasets, initial screening |
Refined-IBD | Accuracy, modeled error rates | Computational intensity | High-precision requirements |
hap-IBD | Robust to phasing errors | Complex parameters | Datasets with phasing challenges |
IBIS | Works with unphased data | Limited to shorter segments | Unphased consumer data |
Bonsai's integration modules account for these differences and can adjust its analysis approach based on the IBD detection tool used.
Creating Integrated Workflows
End-to-End Genetic Genealogy Processes
By combining Bonsai v3 with other tools and systems, researchers can create comprehensive genetic genealogy workflows tailored to specific research questions and contexts:
Example Workflow: Unknown Parentage Case
- Data Collection: Testing with multiple platforms for maximum match coverage
- IBD Detection: Using specialized tools to identify shared DNA segments
- Relationship Inference: Using Bonsai to predict relationships from IBD patterns
- Match Clustering: Grouping matches by likely family branches
- Tree Building: Constructing partial family trees for each cluster
- Common Ancestor Identification: Finding connecting points between trees
- Hypothesis Validation: Using documentary research to verify predictions
Example Workflow: Population Study
- Sample Collection: Gathering genetic data from the population of interest
- Admixture Analysis: Using population genetics tools to assess ancestry
- IBD Detection: Identifying shared segments within the population
- Relationship Network Construction: Using Bonsai to build a comprehensive relationship network
- Historical Context Integration: Incorporating documentary and demographic information
- Network Analysis: Applying social network analysis to the relationship structure
- Visualization and Reporting: Presenting the findings with appropriate visualizations
Integration Pipeline Example
# Example of a complete integration pipeline def run_integrated_workflow(raw_data_files, known_relationships=None): """ Run a complete integrated genetic genealogy workflow. Args: raw_data_files: List of paths to raw genetic data files known_relationships: Optional dict of known family relationships Returns: Complete analysis results """ # Phase 1: Data preparation processed_data = [] for file_path in raw_data_files: file_format = detect_file_format(file_path) processed_file = process_raw_data(file_path, file_format) processed_data.append(processed_file) # Phase 2: IBD detection (using appropriate external tool) ibd_segments = detect_ibd_segments(processed_data) # Phase 3: Relationship inference with Bonsai relationship_predictions = bonsai_analyzer.infer_relationships(ibd_segments) # Phase 4: Family tree integration if known_relationships: integrated_pedigree = integrate_with_known_relationships( relationship_predictions, known_relationships ) else: integrated_pedigree = build_pedigree_from_predictions(relationship_predictions) # Phase 5: Visualization and reporting visualizations = generate_visualizations(integrated_pedigree) report = generate_analysis_report(integrated_pedigree, relationship_predictions) return { "pedigree": integrated_pedigree, "relationships": relationship_predictions, "visualizations": visualizations, "report": report }
Conclusion and Next Steps
Bonsai v3's integration capabilities enable it to function as a key component in comprehensive genetic genealogy workflows, connecting with DNA testing platforms, family tree systems, IBD detection tools, and other specialized resources. Through mechanisms like the DRUID algorithm, standardized data exchange formats, and flexible APIs, Bonsai can adapt to diverse research contexts and leverage complementary tools to enhance its relationship inference capabilities.
By understanding and utilizing these integration mechanisms, researchers can create powerful, customized workflows that combine the strengths of multiple tools and data sources to address complex genetic genealogy challenges.
In the next lab, we'll explore how to implement end-to-end pedigree reconstruction pipelines using Bonsai v3, integrating all the components we've studied throughout this course.
Your Learning Pathway
Interactive Lab Environment
Run the interactive Lab 28 notebook in Google Colab:
Google Colab Environment
Run the notebook in Google Colab for a powerful computing environment with access to Google's resources.
Data will be automatically downloaded from S3 when you run the notebook.
Note: You may need a Google account to save your work in Google Drive.
This lab is part of the Visualization & Advanced Applications track:
Rendering
Lab 21
Interpreting
Lab 22
Twins
Lab 23
Complex
Lab 24
Real-World
Lab 25
Performance
Lab 26
Prior Models
Lab 27
Integration
Lab 28
End-to-End
Lab 29
Advanced
Lab 30