Robust Methods for Anomaly Detection with Applications to Cyber Data

Zhou, Chong

Etd

Robust Methods for Anomaly Detection with Applications to Cyber Data

Public

In many real-world problems, large outliers and pervasive noise are commonplace, and one may not have access to clean training data. Accordingly, anomaly detection methods are useful to detect and remove anomalies for further analysis. Robust Principal Component Analysis (Robust PCA) is an example of such a method that splits data into a sparse anomaly part and the remaining part which can be projected on a linear low-dimensional manifold.\n\nOur work consists of both methodology development and real-world applications.\nWe generalize Robust PCA from discovering linear manifolds to non-linear relationships in the data. In the recent literature, deep autoencoders and other deep neural networks have demonstrated their effectiveness in exploring non-linear features across many problem domains. Our extension combines deep autoencoders and Robust PCA, which not only maintain a deep autoencoders' ability to discover non-linear features, but can also eliminate noise. \nWe present generalizations of our results to grouped sparsity norms which distinguish anomalies from structured corruptions, such as a collection of instances having more corruptions than their fellows. Leveraging grouped norms allows our method to detect row-wise outliers. Both denoising and outlier detecting increase the robustness of standard deep autoencoders, and we named our novel method a ``Robust Deep Autoencoder (RDA)'. This work has been published as a full paper on the research track of the KDD'17 conference.\nFurther, we propose a model consisting of a hierarchical collection of RDAs which maintains the spirit of stacked denoising autoencoders and hierarchical neural networks. \nBy allowing any advanced autoencoder, such as a sparse autoencoder or a variational autoencoder, to name but a few, to replace the standard autoencoders used previously in the RDA framework, we demonstrate that the RDA framework can be expanded to a wide range of deep models. These models include, but are not limited to, grouped norm regularized sparse autoencoders and variational generative models.\n\nOn the aspect of practical employments, we present real-world applications of Robust PCA and RDA to the cyber security domain to analyze dimensionality of data and find anomalies.\nWe detect anomalies in data arising from high fidelity simulation networks and both Robust PCA and RDA provide effective features capable of identifying different Distributed Denial of Service (DDoS) attacks. \nFinally, we provide a procedure for modifying Robust PCA to adapt to dynamic streaming data. Our contribution has been built into the Adaptive Resource Management Enabling Deception (ARMED) system which aims to enable better detection and mitigation of DDoS attacks.

Creator