Bayesian Small Area Analyses of the Unrelated Question Design with Multiple Sensitive Questions

Yu, Yuan

Etd

Bayesian Small Area Analyses of the Unrelated Question Design with Multiple Sensitive Questions

Public

\tElicitation of answers for sensitive questions is a delicate issue, and even questions of basic demographics (e.g., age, race, sex) can be offensive to some people. In sample surveys with sensitive questions, randomized response techniques have a huge advantage in estimating population quantities (e.g., proportion of people cheating on their tax returns) because they can reduce the bias caused by non-response or untruthful response (measurement error). Using hierarchical Bayesian models, we implement multiple sensitive questions into the simple unrelated question design for small areas (or clusters).\n\t\t\n\tMost of the work on the unrelated question design rely on large sample sizes to get admissible estimates and there are limited discussions about applications on data from small areas. Bayesian methods work well because they allow pooling of data from desperate (limited data) areas and they can utilize prior information. In addition, few discussions have been made exploring the benefits of a combined design involving multiple items (e.g., two sensitive questions) under the Bayesian paradigm. Therefore, in our study, given binary response data from two or more sensitive questions from many small areas, we use a hierarchical Dirichlet-multinomial model to estimate the sensitive proportions. A blocked Gibbs sampler is used to sample the joint posterior density and the posterior distributions of finite population proportions can be obtained. We apply our method to college cheating data, obtained from our students at WPI with permission from IRB. We also use a simulation study to validate our method, and we investigate the effects on posterior inference of increasing the number of areas (clusters) and the correlation between the sensitive items. \n\t\t\n\tWhen there is a large number of areas, our procedure is computationally intensive. Also, the Dirichlet distribution gives negative correlated probabilities and this is inflexible. Therefore, to make our procedure more useful, we propose a generalized mixed effects model which will set free the constraint of the Dirichlet parameters that must add up to unity. Then based on the new parameter setting, we are able to either use a full Gibbs sampler or an integrated nested normal approximation to make posterior inference about the finite population proportions of students cheating in different courses. This alternative method allows for much faster computing and many more areas (courses). This model has much fewer parameters, and therefore, there are gains in precision when the finite population proportions are estimated. It also permits incorporating covariates, when available, in a straightforward manner.\n\t\t\n\tFinally, we propose that our randomized response procedure can be used to provide masked public-used data, that is an important activity for many government agencies, where although other procedures are used, the randomized response procedure was never attempted for privacy protection of released data. \n

Creator