Prof. Gina D'Angelo
|
Division of Biostatistics,
|
Title: A likelihood-based approach for missing genotype data |
Abstract: Missing
genotype data in a candidate gene association study can make it difficult to
model the effects of multiple genetic variants simultaneously. In particular,
when regression models are used to model phenotype as a function of SNP
genotypes in several different genes, the most common approach is a complete
case analysis, in which only individuals with no missing genotypes are
included. But this can lead to substantial reduction in sample size and thus
potential bias and loss in efficiency. A number of other methods for handling
missing data are applicable, but have rarely been used in this context. The purpose of this paper is to
describe how several standard methods for handling missing data can be
applied or adapted to this problem, and to compare their performance using a
simulation study. We demonstrate these techniques using an Alzheimer's
disease association study. We show that the EM algorithm and multiple imputation with a bootstrapped EM sampling algorithm have
the best properties of all the estimators we studied. |