Utilizing Unimodal and Multimodal Deep Transfer Learning to Classify Mobile Speech Recordings with Mental Health Labels

Hernandez-Reisch, Miranda

Etd

Utilizing Unimodal and Multimodal Deep Transfer Learning to Classify Mobile Speech Recordings with Mental Health Labels

Public

The number of people experiencing symptoms of mental illness is rapidly increasing. While questionnaires are traditionally used to screen for mental illnesses, prior research has explored the ability to use clinical interview recordings to screen for depression with machine learning models. However, these interviews are time intensive for the patient and recent advances in deep learning suggest that the screening could be completed with shorter voice recordings. In this thesis, we thus determine the ability to classify mental illness using voice recordings to individual clinical interview questions collected on a mobile device. We recruited crowd-sourced participants to download a mobile application to record responses to individual clinical interview questions, as well as complete anxiety and depression screening questionnaires. We then extracted transcripts from the voice recordings to allow us to use audio and text components on state-of-the-art unimodal and multimodal deep learning transfer models. For the majority of our individual interview questions, our mobile-collected single question voice recordings proved more predictive than their longer clinical interview counterparts. We also determined mobile recorded voice recordings are more predictive of depression than anxiety. Therefore, our research provides much needed insight into the development of mental health mobile screening technologies.

Creator