Etd

Bayesian Logistic Regression Model with Integrated Multivariate Normal Approximation for Big Data

Public

Downloadable Content

open in viewer

The analysis of big data is of great interest today, and this comes with challenges of improving precision and efficiency in estimation and prediction. We study binary data with covariates from numerous small areas, where direct estimation is not reliable, and there is a need to borrow strength from the ensemble. This is generally done using Bayesian logistic regression, but because there are numerous small areas, the exact computation for the logistic regression model becomes challenging. Therefore, we develop an integrated multivariate normal approximation (IMNA) method for binary data with covariates within the Bayesian paradigm, and this procedure is assisted by the empirical logistic transform. Our main goal is to provide the theory of IMNA and to show that it is many times faster than the exact logistic regression method with almost the same accuracy. We apply the IMNA method to the health status binary data (excellent health or otherwise) from the Nepal Living Standards Survey with more than 60,000 households (small areas). We estimate the proportion of Nepalese in excellent health condition for each household. For these data IMNA gives estimates of the household proportions as precise as those from the logistic regression model and it is more than fifty times faster (20 seconds versus 1,066 seconds), and clearly this gain is transferable to bigger data problems.

Creator
Contributors
Degree
Unit
Publisher
Language
  • English
Identifier
  • etd-042816-071509
Keyword
Advisor
Defense date
Year
  • 2016
Date created
  • 2016-04-28
Resource type
Rights statement

Relations

In Collection:

Items

Items

Permanent link to this page: https://digital.wpi.edu/show/4t64gn35d