Automatic Emotion Detection in Text Messages using Supervised Learning

Hasan, Maryam

Etd

Automatic Emotion Detection in Text Messages using Supervised Learning

Public

Emotion detection from text is the task of detecting affective states from natural language artifacts including comments, reviews, messages, and social media posts. Emotion detection tools could potentially be employed in several fields from social science, political science, public health research to marketing research. In this dissertation, we study the problem of detecting and analyzing emotion in textual data using traditional machine learning and deep learning methods. Emotion detection entails classifying text into categories of emotions such as happiness, sadness, and anger. Supervised emotion classification is challenging due to the limited number of labeled data resources. Moreover, it is further complicated due to involving a high-dimensional feature space and a large number of emotion categories. This dissertation designs, develops, and evaluates three innovative strategies. First, we develop Emotex, a supervised emotion classification approach using static feature vectors. Feature extraction is a fundamental building block of emotion classification systems. To solve the problem of the high-dimensional feature space, Emotex relies on hand-crafted features selected from lexicons for deriving word-emotion association. Emotex utilizes embedded hashtags to automatically label the emotions expressed in text messages. It builds a large corpus of emotion-labeled messages with no manual effort to train emotion classifiers. Our experimental results show that Emotex models were able to achieve about 90% accuracy on test data for multi-class emotion classification. Emotex requires extensive hand-crafted features to achieve high performance due to diverse ways of representing emotions in different domains. Such hand-crafted features are time-consuming to create and may be incomplete. To solve this problem, we develop a deep learning approach called DeepEmotex that learns emotion-specific features based on the input textual context instead of using static hand-crafted features. In particular, DeepEmotex learns emotion-specific features using sequential transfer learning. For this, we develop a sequential transfer learning framework to fine-tune the pre-trained language models. More precisely, DeepEmotex utilizes two state-of-the-art pre-trained models, known as BERT and Universal Sentence Encoder (USE). We analyze the adaptation or fine-tuning phase during which the pre-trained knowledge is transferred to our emotion classification task. We fine-tune our models on a total of 300,000 tweets as our training dataset, validate on 60,000 tweets, and use 180,525 tweets as our test dataset. By fine-tuning USE, an overall accuracy of 91% on our test dataset is achieved. Us-ing different batch sizes to fine-tune BERT, we achieve 92% accuracy on our test data. We also evaluate the performance of DeepEmotex models in classifying emotion inEmoInt benchmark dataset. DeepEmotex models obtain state-of-the-art performance on classifying emotion in the benchmark dataset. Evaluation results show that the proposed BERT model outperforms the state-of-the-art result using the Bidirectional-LSTM-CNN model by 3%. After developing emotion classification models, we deploy the trained models to analyze live streams of tweets. For this, we develop a framework called EmotexStream.First, a binary classifier separates tweets with explicit emotion from tweets without emotion. Then, our emotion classification models are utilized for a fine-grained emotion classification of tweets with explicit emotion. We also propose an online method to measure public emotion and detect abrupt changes in emotion as emotion-burst moments in live text streams. Through a series of case studies, we confirm that the proposed methods are able to detect emotion-critical moments during real-life events.

Creator