Abstract
Genotype imputation is a widely-used data augmentation approach that is applied to samples of related and/or unrelated individuals. Association testing may then be carried out on the complete data with commonly-used methods. This approach has typically not accounted for the mix of observed and imputed data, although recent work has noted the potential for introduction of confounding in case-control studies. In the Alzheimer's Disease Sequencing Project family sample we found severe inflation of the test statistics in logistic regression analysis following genotype imputation, even after standard covariate adjustments. Here we dissect sources of this inflation, which is driven by three factors: frequency-dependent bias in imputation-induced allele frequencies, differential measurement error, and differential genotyping rates in cases versus controls that introduces confounding. To address the problem, we propose a statistic, imputation deviance ((Formula presented.)), which can be easily computed from the observed and imputed genotype probabilities. We show that (Formula presented.), as an additional fixed-effect covariate, controls the genome-wide inflation in analysis of this family-based sample, and we speculate that use of imputation deviance may also provide a practical approach to correct for genotype imputation effects in other settings, particularly when a data set is unbalanced and includes related individuals.
| Original language | English |
|---|---|
| Article number | e70021 |
| Journal | Genetic Epidemiology |
| Volume | 49 |
| Issue number | 8 |
| DOIs | |
| Publication status | Published - Dec 2025 |
| Externally published | Yes |
Keywords
- GWAS
- Pedigree
- WGS
- data augmentation
- genomic control
- missing data
- mixed model