Simple Methods for Handling Non-Randomly Missing Data
Abstract: In multiple linear or logistic regression, multiple imputation has become increasingly popular for handling missing covariate values. The much simpler approach of listwise deletion or complete-case analysis is often dismissed as making overly strong assumptions. However, I will point out that complete-case analysis is consistent and performs better than multiple imputation for many types of non-random missingness mechanisms.
In longitudinal data analysis, dropout or intermittently missing responses are typically dealt with by specifying a joint model for the responses, such as a growth-curve/hierarchical/multilevel model, and estimating the parameters by maximum likelihood. This approach is consistent if missingness of a response depends on observed responses for the same individual but not if if it depends on the response itself or on the random effects in the model. One way of handling such non-random missingness is to model missingness jointly with the response variable of interest, but these joint models are complex, require specialized software, and make unverifiable assumptions. I will suggest simple fixed-effects approaches that are consistent if missingness depends on the random effects and, in the case of binary responses, if missingness depends on the response itself or previous (observed or unobserved) responses.
Sophia Rabe-Hesketh is a Professor of Education and Biostatistics at the University of California, Berkeley. She was previously a Professor of Social Statistics at the University of London. Her research interests include hierarchical/multilevel models, item-response theory, structural equation models, and generalized latent variable models. She has developed a general model framework “Generalized Linear Latent and Mixed Models,” that unifies and extends these models and corresponding software, gllamm, that has been used in over 550 different journals. Some recent research is on estimation methods for random effects and latent variable models and non-ignorable missing data problems. She has co-authored 6 books, including “Generalized Latent Variable Modeling” and “Multilevel and Longitudinal Modeling Using Stata” (both with Anders Skrondal). Her books and over 100 peer-reviewed journal articles are highly cited with a Google Scholar h-index of 52. Rabe-Hesketh is a member of the technical advisory committees for the U.S. National Assessment of Educational Progress (NAEP) and the Programme for International Student Assessment (PISA) and an elected member of the International Statistical Institute. She is the current president-elect of the Psychometric Society.
For more information about Sophia Rabe-Hesketh, check out her website! www.gllamm.org/sophia.html