Missing data

Missing data are a common problem in the analysis of real data. For example, in survey applications in social sciences or official statistics, a respondent may skip one or more questions. Or worse, invited subjects may elect not to participate at all. Popular ad-hoc techniques, such as filling in the mean, have side effects that often go unnoticed and that may bias the results.

About us

Our missing data group at Utrecht University develops solutions for incomplete data problems that yield correct statistical properties. We developed the robust and straightforward Multivariate Imputation by Chained Equations (MICE) algorithm. We made our solution available through the mice package in R. MICE builds upon the strong foundations of multiple imputation. The key to the solution is to incorporate the additional uncertainty caused by the missing values into the quantitative analysis.

MICE has grown into the de facto standard for scientific analyses in disciplines like survey research, epidemiology and clinical trials. Our group will continue to work on faster, better-behaved imputation algorithms, user-friendly software and well-documented methodologies. We spread our knowledge through lectures, courses and open-source collaborative projects.

Current projects

  • Mice, highly popular open-source R software for solving missing data problems;
  • Ampute: Software to simulate valid missingness mechanisms on multivariate data. See also the Vignette.
  • ShinyMice: Model building and evaluation for MICE models.

Books

Van Buuren, S. (2018). Flexible imputation of missing data. Second edition. CRC/Chapman & Hall, Boca Raton, FL.

Researchers involved