Computational Genetic Genealogy

Glossary of Genetic Terms

Glossary of Genetic Terms

This glossary provides definitions for key terms used throughout the course, with an emphasis on their relevance to computational genetic genealogy and anthropological research.

Allele
One of the alternative forms of a gene or genetic locus. Different alleles can result in different observable traits.
Autosomal DNA
DNA contained in the 22 pairs of autosomes (non-sex chromosomes) that is inherited from both parents and recombined during meiosis.
centiMorgan (cM)
A unit of genetic distance that represents the probability of recombination during meiosis. One centiMorgan corresponds to approximately a 1% chance of recombination between two loci.
Chromosome
A long DNA molecule containing many genes and associated proteins that carries genetic information.
Endogamy
The practice of marrying within a specific social group, resulting in increased genetic relatedness within the population and potentially complicating genetic relationship inference.
Founder Effect
The reduction in genetic diversity that occurs when a small subset of individuals from a larger population establishes a new population, common in diaspora communities.
Genetic Map
A representation of the relative positions of genetic markers along a chromosome based on recombination frequencies, measured in centiMorgans.
Genotype
The genetic constitution of an individual at specific genetic loci.
Haplotype
A group of genes or genetic markers that are inherited together from a single parent.
HBD (Homozygous-By-Descent)
Segments of DNA where both copies are identical by descent, typically due to inheriting identical segments from related parents (e.g., in consanguineous marriages).
IBD (Identity-By-Descent)
Segments of DNA that are identical between two individuals because they were inherited from a common ancestor, forming the basis for genetic relationship inference.
IBS (Identity-By-State)
Genetic markers that are identical between individuals, without necessarily being inherited from a common ancestor.
Kinship Coefficient
A measure of relatedness between two individuals, defined as the probability that alleles sampled at random from each individual are identical by descent.
LOD Score
Log of odds score; a statistical measure of the likelihood of genetic linkage between loci or the confidence in an inferred genetic relationship.
MAF (Minor Allele Frequency)
The frequency of the less common allele at a polymorphic locus in a population.
Meiosis
The type of cell division that produces gametes (eggs or sperm) with half the chromosome number of the parent cell, involving recombination of genetic material.
Pedigree
A diagram representing the genetic relationships among individuals, tracing patterns of inheritance across generations.
Phasing
The process of determining which alleles are on the same chromosome (haplotype), crucial for accurate IBD detection.
Recombination
The exchange of genetic material between homologous chromosomes during meiosis, resulting in new combinations of alleles.
SNP (Single Nucleotide Polymorphism)
A variation at a single position in a DNA sequence, the most common type of genetic variation among people.
VCF (Variant Call Format)
A standard file format for storing genetic variation data, including SNPs and small insertions/deletions.
Y-DNA
DNA contained in the Y chromosome, passed from father to son and used to trace paternal lineages.
mtDNA (Mitochondrial DNA)
DNA contained in mitochondria, passed from mother to all children and used to trace maternal lineages.

Terms Specific to Computational Genetic Genealogy

Genetic Ancestry Components
A more contemporary approach replacing the term "admixture." This concept recognizes that all human populations share common ancestry and have experienced ongoing genetic exchange throughout history. Rather than viewing populations as discrete entities that "mix," this framework acknowledges the continuous nature of human genetic variation and the artificial nature of population boundaries. All humans possess diverse genetic ancestry components reflecting our shared evolutionary history and complex patterns of migration, interaction, and relationship building.
Ancestry-Informed Analysis
An approach that uses patterns of genetic variation to identify genomic regions associated with traits or diseases that show frequency differences across geographic regions, without relying on discrete population categories. This approach acknowledges that genetic ancestry exists along continua rather than as discrete categories.
Coalescent Simulation
A computational approach that models genetic inheritance backward in time to the most recent common ancestor, used in MSPrime simulations.
False Discovery Rate (FDR)
In IBD detection, the proportion of detected segments that are not truly IBD.
Forward-time Simulation
A computational approach that models genetic inheritance forward in time through specified pedigrees, used in Ped-Sim simulations.
Genetic Genealogy
The use of genetic testing in combination with traditional genealogical methods to infer relationships between individuals.
Ground Truth
In simulation contexts, the actual genetic relationships and IBD segments that are known with certainty and used to evaluate detection methods.
Hidden Markov Model (HMM)
A statistical model used in IBD detection algorithms like IBIS to infer the hidden state (IBD/not-IBD) from observed genetic data.
Pedigree Reconstruction
The computational process of inferring family relationships and structures based solely on genetic data, without prior genealogical information.
Population Structure
The organization of genetic variation within and between populations, often visualized using techniques like Principal Component Analysis (PCA).
Positional Burrows-Wheeler Transform (PBWT)
An algorithm used in Hap-IBD for efficient detection of identical haplotype segments between individuals.
Precision
In IBD detection evaluation, the proportion of detected segments that are truly IBD.
Recall (Sensitivity)
In IBD detection evaluation, the proportion of true IBD segments that are successfully detected.

This glossary serves as a reference resource for all lab modules in the course:

Environment

Setup

Data

Acquisition

Processing

& QC

IBD

Detection

Simulation

& Evaluation

Bonsai

Algorithm

Tip: As you explore genetic genealogy, remember that terms and concepts are continuously evolving to better reflect our understanding of human genetic diversity. Contemporary approaches increasingly recognize the continuum of human genetic variation rather than discrete population categories.