BAYESIAN PREDICTIVE INFERENCE WITH STICK-BREAKING PROCESSES AND SURVEY WEIGHTS

Yang, Lingli

Etd

BAYESIAN PREDICTIVE INFERENCE WITH STICK-BREAKING PROCESSES AND SURVEY WEIGHTS

Público Deposited

In this dissertation, we aim to perform robust Bayesian predictive inference by incorporating the stick-breaking process, specifically the two-parameter Pitman-Yor process, and survey weights, seeking to overcome the inherent limitations of traditional analytical approaches. Central to our methodology is the incorporation of two key elements: the stick-breaking process and survey weights. The stick-breaking process is a stochastic process used in Bayesian nonparametric statistics to generate distributions over an infinite number of components or clusters, which adds flexibility by allowing the complexity of the model to grow with the data. Additionally, traditional methods are to include weights when estimating a population interest but not so much as to use normalization constant when fitting a model. However, incorporating weights directly into the model can capture more information that the key variables cannot fully account for in the complex survey designs, such as the nonresponse and nonprobability samples, and for Bayesian, the normalization densities should be used. Furthermore, we introduce a thorough Bayesian predictive inference framework for binary and continuous study variables that use surrogate sampling. In Chapter 2, we dive deep into the different types of survey weights and weighted densities, and then, incorporate the auxiliary information into models. In this chapter, we concentrate on binary data, which can be found in a wide range of circumstances. First, we discuss the models without covariates. By the performance of simulation and application on body mass index, the findings indicate that trimmed weight methods are better than untrimmed weight methods, and that adjusted weight methods are more variable than unadjusted weight methods. Second, covariates are included in the weighted logistic regression models. We investigate six different models with three different adjusted survey weight variations. A real-world body mass index (BMI) dataset and a simulation study are used to evaluate the relative performance and effectiveness of the six models. Our findings show that the normalized model with adjusted trimmed weights has better performance while still following the rules of Bayesian theory. We show in Chapter 3 the benefit of incorporating a stick-breaking process in our models to make Bayesian predictive inferences about the finite population mean, while also incorporating survey weights into our models. Using small area estimation techniques, we treat each unique combination of covariates as a separate domain and conclude the finite population mean of the continuous study variable. The stick-breaking process (SB) model and the stick-breaking process and mixture distribution of study variable (RSB) model are the two robust models that are employed. By generating the partition matrix using the designated cluster, we incorporate the stick-breaking beforehand. Additionally, we demonstrate how to use survey weights in the robust models for probability survey data. We compare the results of one baseline model, the Battese-Harter-Fuller (BHF) model to the robust models. Our goal is to have domains with similar attributes yield similar predictions. Ultimately, using the stick-breaking process shows less variation compared to the non-clustered model, which was the desired outcome. In Chapter 4, one type of robust model expands on the concepts introduced in Chapter 3, by incorporating the stick-breaking process into the study variable. We propose the novel dual stick-breaking process model to strengthen the model's robustness, which employs the stick-breaking process twice: first to cluster the study variable, and then to cluster random effects. This model allows for generating any number of clusters on the study variable, making a significant advancement compared to the previous models with only one or two clusters. The results show that the dual stick-breaking process model outperforms the Battese-Harter-Fuller model in both accuracy and precision. We provide multiple Bayesian hierarchical models that allow for predictive inferences to be made about a study variable including covariates, weights, and stick-breaking process. We have successful results that pave the way for further extensions of alternate weighted models in survey sampling. Our results show that the normalized model with adjusted trimmed weights has better performance. Then by adding a clustering component via the stick-breaking process and incorporating the adjusted trimmed weights into the Battese-Harter-Fuller model, we are able to group units with similar attributes, resulting in a reduction of bias. Our models expand the scope of the applications we can explore in survey sampling.

Creator