Model Selection when multiple imputation is used to protect confidentiality in public use data

Satkartar Kaur Kinney; Saki Kinney; Jerome Reiter; Berger James O.

Model Selection when multiple imputation is used to protect confidentiality in public use data

Kinney, S., Reiter, J., & James O., B. (2010). Model Selection when multiple imputation is used to protect confidentiality in public use data. Journal of Privacy and Confidentiality, 2(2), Article 2. https://doi.org/10.29012/jpc.v2i2.588

Copy citation

Abstract

Several statistical agencies use, or are considering the use of, multiple imputation to limit the risk of disclosing respondents' identities or sensitive attributes in public use files. For example, agencies can release partially synthetic datasets, comprising the units originally surveyed with some values, such as sensitive values at high risk of disclosure, or values of key identifiers, replaced with multiple imputations. We describe how secondary analysts of such multiply-imputed datasets can implement Bayesian model selection procedures that appropriately condition on the multiple datasets and the information released by the agency about the imputation models. We illustrate by deriving Bayes factor approximations and a data augmentation step for stochastic search variable selection algorithms.

Recent Publications

Article

The early motor questionnaire facilitates the remote assessment of normative motor development in infancy and toddlerhood

January 01, 2025

Article

Adult vaccination coverage in the United States

December 31, 2024

Article

Outcomes of substance use and sexual power among adolescent girls and young women in Cape Town

December 31, 2024

Article

The impact of violations of expected utility theory on choices in the face of multiple risks

December 01, 2024

View All Publications