Computational Genetic Genealogy

Pedigree Rendering and Visualization

Lab 21: Pedigree Rendering and Visualization

Core Component: This lab explores the pedigree rendering and visualization techniques used in Bonsai v3. These techniques are essential for helping users interpret and understand the results of genetic genealogy analyses. Effective visualization makes complex pedigree structures more accessible and highlights important genetic relationships.

The Importance of Visualization in Genetic Genealogy

Why Visualization Matters

Genetic pedigrees are complex structures that can be difficult to interpret from raw data alone. Effective visualization serves several critical functions:

  • Intuitive Understanding: Translates abstract genetic relationships into visually intuitive family structures
  • Pattern Recognition: Helps identify patterns and connections that might be missed in tabular data
  • Communication: Facilitates sharing and discussing findings with others
  • Validation: Provides a way to visually confirm that inferred relationships make biological sense

Bonsai v3 includes sophisticated rendering capabilities that leverage graph theory and visualization principles to create clear, informative representations of complex pedigree structures.

Graph-Based Representation of Pedigrees

Pedigrees as Directed Graphs

In Bonsai v3, pedigrees are naturally represented as directed graphs, where:

  • Nodes represent individuals
  • Edges represent parent-child relationships
  • Direction flows from parent to child

This graph-based representation enables the application of powerful graph algorithms for analyzing family structures and provides a natural basis for visualization.

Up-Node Dictionary: The Foundation of Pedigree Representation

Bonsai v3 uses an "up-node dictionary" as its primary data structure for representing pedigrees. This dictionary maps each individual to their parents:

{
  child_id_1: {parent_id_1: degree, parent_id_2: degree},
  child_id_2: {parent_id_3: degree, parent_id_4: degree},
  ...
}

This structure efficiently encodes the directed graph of the pedigree, with each key representing a child node and the values representing the parent nodes.

Example: Simple Family Structure

Consider a simple family with grandparents (1, 2), parents (3, 4), and a child (5). In the up-node dictionary format:

{
  3: {1: 1, 2: 1},  # Individual 3 has parents 1 and 2
  4: {},            # Individual 4 has no parents in the pedigree
  5: {3: 1, 4: 1}   # Individual 5 has parents 3 and 4
}

This compact representation captures the entire family structure and can be easily rendered as a directed graph.

The render_ped Function in Bonsai v3

Core Visualization Function

At the heart of Bonsai's pedigree visualization is the render_ped function in the rendering.py module, which converts an up-node dictionary into a graphical representation using the Graphviz library:

def render_ped(
    up_dct: dict[int, dict[int, int]],
    name: str,
    out_dir: str,
    color_dict=None,
    label_dict=None,
    focal_id=None,
):
    """
    Render a pedigree as a directed graph.
    
    Args:
        up_dct: Up-node dictionary mapping individuals to their parents
        name: Base name for the output file
        out_dir: Directory to save the rendered image
        color_dict: Dictionary mapping node IDs to colors
        label_dict: Dictionary mapping node IDs to labels
        focal_id: ID of the focal individual to highlight
    """
    dot = graphviz.Digraph(name)
    all_id_set = get_all_id_set(up_dct)
    
    # Set default values
    if color_dict is None:
        color_dict = {i: 'dodgerblue' for i in all_id_set if i > 0}
    if label_dict is None:
        label_dict = {n: str(n) for n in all_id_set}
    if focal_id is not None:
        color_dict[focal_id] = 'red'
    
    # Add nodes (individuals)
    for n in all_id_set:
        edge_color = None
        fill_color = color_dict[n] if n in color_dict else None
        style = 'filled' if n in color_dict else None
        label = label_dict.get(n, "")
        
        dot.node(
            str(n),
            color=edge_color,
            fillcolor=fill_color,
            style=style,
            label=label,
        )
    
    # Add edges (parent-child relationships)
    for c, pset in up_dct.items():
        for p in pset:
            dot.edge(str(p), str(c), arrowhead='none')
    
    # Render the graph
    plt.clf()
    dot.render(directory=out_dir).replace('\\', '/')

This function provides a flexible foundation for pedigree visualization, with options to customize colors, labels, and highlight focal individuals.

Customizing Pedigree Visualizations

Beyond Basic Rendering

While Bonsai's render_ped function provides solid baseline functionality, there are many ways to enhance and customize pedigree visualizations:

1. Node Attributes by Individual Characteristics

Nodes can be customized to represent individual characteristics:

  • Color coding by sex (blue for males, pink for females)
  • Shape variation by status (rectangles for living, ovals for deceased)
  • Border styles for additional attributes (dashed for adopted, dotted for uncertain)
  • Size variation for emphasis or to represent additional metrics
Example: Sex-Based Node Styling
# Create dictionaries for customization
color_dict = {
    id_val: 'skyblue' if sex_dict[id_val] == 'M' else 'pink' 
    for id_val in pedigree.keys()
}

shape_dict = {
    id_val: 'box' if sex_dict[id_val] == 'M' else 'ellipse' 
    for id_val in pedigree.keys()
}

# Use in enhanced rendering function
dot.node(
    str(node_id),
    fillcolor=color_dict.get(node_id, 'white'),
    shape=shape_dict.get(node_id, 'box'),
    style='filled'
)
2. Edge Styling for Relationship Information

Edges can be styled to convey relationship information:

  • Color variation for relationship types or confidence
  • Line thickness for relationship closeness or strength of evidence
  • Line styles for relationship types (solid for biological, dashed for adoptive)
  • Edge labels for additional relationship details
3. Layout Customization

The layout of the pedigree can significantly impact its interpretability:

  • Direction settings (top-down, bottom-up, left-right)
  • Node spacing for clearer visual separation
  • Subgraph clustering for organizing related individuals
  • Rank alignment to position individuals by generation
Graphviz Layout Options

Graphviz supports several layout algorithms that can be applied to pedigrees:

  • dot: Hierarchical layout ideal for pedigrees with clear generational structure
  • neato: Spring model layout useful for pedigrees with many interconnections
  • fdp: Force-directed layout good for large pedigrees
  • twopi: Radial layout that places focal individuals at the center

Visualizing IBD Sharing in Pedigrees

Integrating Genetic Evidence in Visualizations

One of the most powerful applications of pedigree visualization is the ability to overlay genetic sharing information onto the family structure. This helps users understand how genetic evidence supports the inferred relationships.

IBD Overlay Techniques
  • Additional edges between individuals who share DNA
  • Edge thickness proportional to amount of shared DNA
  • Color gradients to indicate strength of genetic connection
  • Edge labels showing total cM and segment counts
Example: Adding IBD Sharing Information
# First, add regular parent-child edges
for child, parents in pedigree.items():
    for parent in parents:
        dot.edge(
            str(parent),
            str(child),
            color='black',
            style='solid',
            penwidth='1'
        )

# Then add IBD sharing edges with custom styling
for (id1, id2), data in ibd_data.items():
    total_cm = data['total_cm']
    
    # Skip if below threshold
    if total_cm < min_cm:
        continue
    
    # Calculate edge attributes based on total cM
    # Thicker edges for more sharing
    penwidth = 0.5 + min(5, total_cm / 500)
    
    # Color intensity based on total cM
    intensity = min(255, int(50 + (total_cm / 3500) * 205))
    color = f"#{intensity:02x}00{255-intensity:02x}"
    
    # Add the IBD edge
    dot.edge(
        str(id1),
        str(id2),
        color=color,
        style='dashed',
        penwidth=str(penwidth),
        constraint='false',  # Don't use this edge for layout
        label=f"{total_cm:.1f} cM"
    )

This technique creates a visual representation that combines the structural information of the pedigree with the genetic evidence supporting those relationships, providing a more complete picture.

Chromosome Painting Visualizations

Visualizing Segment-Level IBD Sharing

Chromosome painting is another important visualization technique in genetic genealogy that complements pedigree diagrams by showing the specific chromosomal segments shared between individuals.

While not directly implemented in Bonsai's rendering module, chromosome painting can be created using matplotlib to provide detailed segment-level information:

Example: Chromosome Painting Implementation
def create_chromosome_painting(individual_id, ibd_data, figsize=(15, 10)):
    """
    Create a chromosome painting visualization for an individual.
    
    Args:
        individual_id: ID of the individual to visualize
        ibd_data: Dictionary mapping pairs of individuals to IBD sharing data
        figsize: Figure size (width, height)
        
    Returns:
        Matplotlib figure
    """
    # Extract segments involving the individual
    segments = []
    for (id1, id2), data in ibd_data.items():
        if id1 == individual_id or id2 == individual_id:
            other_id = id2 if id1 == individual_id else id1
            for segment in data['segments']:
                segments.append({
                    'chromosome': segment['chromosome'],
                    'start_pos': segment['start_pos'],
                    'end_pos': segment['end_pos'],
                    'cm': segment['cm'],
                    'other_id': other_id
                })
    
    # Sort segments by chromosome and position
    segments.sort(key=lambda s: (
        int(s['chromosome']) if s['chromosome'].isdigit() else 999, 
        s['start_pos']
    ))
    
    # Get the unique chromosomes
    chromosomes = sorted(set(s['chromosome'] for s in segments), 
                        key=lambda x: int(x) if x.isdigit() else 999)
    
    # Create figure with one subplot per chromosome
    fig, axs = plt.subplots(len(chromosomes), 1, figsize=figsize, 
                          squeeze=False, sharex=True)
    axs = axs.flatten()
    
    # Create a color map for each unique "other_id"
    other_ids = sorted(set(s['other_id'] for s in segments))
    colors = plt.cm.tab10.colors
    color_map = {other_id: colors[i % len(colors)] 
                for i, other_id in enumerate(other_ids)}
    
    # Draw segments on each chromosome
    for i, chrom in enumerate(chromosomes):
        ax = axs[i]
        chrom_segments = [s for s in segments if s['chromosome'] == chrom]
        
        # Draw chromosome backbone
        ax.plot([0, max(s['end_pos'] for s in chrom_segments)], 
                [0, 0], 'k-', linewidth=2)
        
        # Draw segments
        for segment in chrom_segments:
            other_id = segment['other_id']
            ax.plot(
                [segment['start_pos'], segment['end_pos']],
                [0, 0],
                '-',
                linewidth=10,
                color=color_map[other_id],
                solid_capstyle='butt',
                alpha=0.7
            )
    
    return fig

Chromosome paintings provide complementary information to pedigree diagrams, showing exactly which parts of the genome are shared between individuals. When used together, these visualization techniques offer a comprehensive view of genetic relationships.

Practical Applications and Best Practices

Using Pedigree Visualization Effectively

Effective pedigree visualization is both an art and a science. Here are some best practices for creating clear, informative pedigree visualizations:

Challenge Solution Implementation
Large pedigrees become unwieldy Focus on subtrees of interest Extract subtrees using get_subdict() before rendering
Unclear relationship types Use consistent visual encoding Standardize edge styles and colors for relationship types
Difficulty identifying key individuals Highlight focal individuals Use the focal_id parameter or custom colors
Overlapping edges in complex pedigrees Adjust layout algorithms Try different Graphviz engines (dot, neato, fdp)
Uncertainty in relationships Encode confidence visually Use dashed lines or color gradients for uncertain connections
Common Applications of Pedigree Visualization
  1. Verifying relationship hypotheses by visualizing how they fit into existing family structures
  2. Identifying potential connections between seemingly unrelated individuals
  3. Documenting complex family histories for genealogical research
  4. Communicating findings to family members and other researchers
  5. Validating genetic analysis results by ensuring they form biologically plausible structures

Extending Bonsai's Visualization Capabilities

Beyond Basic Rendering

While Bonsai's render_ped function provides a solid foundation, it can be extended in various ways to create more sophisticated visualizations:

Interactive Visualizations

Converting static pedigree diagrams to interactive visualizations using tools like D3.js or Plotly can significantly enhance user exploration:

  • Zooming and panning for navigating large pedigrees
  • Hover tooltips showing detailed information about individuals
  • Collapsible subtrees for managing complexity
  • Dynamic filtering to show specific relationships or IBD thresholds
Integrating Multiple Data Types

Pedigree visualizations can be enhanced by incorporating additional data types:

  • Historical records linked to specific individuals
  • Geographic information showing migration patterns
  • Ethnicity estimates encoded in node colors or patterns
  • Timeline information showing temporal relationships
Community Extensions

The genetic genealogy community has developed various extensions to standard pedigree visualization:

  • McGuire diagrams for visualizing shared DNA between multiple individuals
  • Fan charts for compact representation of ancestral relationships
  • DNA painter style visualizations for chromosome mapping
  • Network graphs showing complex interrelationships in endogamous populations

Conclusion and Next Steps

Pedigree rendering and visualization are essential components of computational genetic genealogy, transforming abstract genetic relationships into intuitive visual forms. Bonsai v3's rendering capabilities provide a flexible foundation for creating clear, informative pedigree visualizations that help users interpret genetic data in the context of family structures.

By customizing node and edge attributes, integrating IBD sharing information, and applying effective visual design principles, pedigree visualizations can become powerful tools for understanding complex family relationships.

In the next lab, we'll explore how to interpret results and assess confidence in relationship predictions, complementing the visual representations we've explored in this lab with statistical measures of certainty.

Interactive Lab Environment

Run the interactive Lab 21 notebook in Google Colab:

Google Colab Environment

Run the notebook in Google Colab for a powerful computing environment with access to Google's resources.

Data will be automatically downloaded from S3 when you run the notebook.

Note: You may need a Google account to save your work in Google Drive.

Open Lab 21 Notebook in Google Colab

This lab is part of the Visualization & Advanced Applications track:

Rendering

Lab 21

Interpreting

Lab 22

Twins

Lab 23

Complex

Lab 24

Real-World

Lab 25

Performance

Lab 26

Prior Models

Lab 27

Integration

Lab 28

End-to-End

Lab 29

Advanced

Lab 30