Evaluation of approaches for multiple imputation of three-level data


Multilevel data with three levels of hierarchy resulting from repeated measures data on individuals who are clustered within larger units, such as geographical region, are common in health research studies. Missing data, which is a pervasive problem in almost all studies, is particularly prominent in longitudinal studies as they require the participation of respondents at multiple time points. Multiple imputation (MI) is a popular approach for handling missing data. MI approaches for imputing multilevel data have been developed recently most of which can impute missing two-level data only. To our knowledge there are only two implementations that were specifically designed to impute missing data in a three-level setting, one within R and the other in the stand-alone software Blimp. Alternatively, it is possible to extend general MI approaches designed for imputing single-level and two-level data using dummy indicators or a just another variable approach to impute three-level data. However, it is currently unclear which of these approaches is preferable. In this study, we investigated the performance of the various MI methods for imputing three-level incomplete data via a simulation study under a number of different scenarios including different missing data mechanisms, missing data proportions and strengths of level-2 and level-3 intra-cluster correlations. The simulations were based on a case study from Childhood to Adolescence Transition Study which collected repeated measures on students who were clustered within schools. We found that all of the approaches considered produced valid inferences under the various conditions imposed in the simulation study. Therefore, researchers may use extensions to the single- and two-level approaches or the specialized approaches for three-level data to adequately handle incomplete three-level data.

Jul 5, 2021