Patterns Detection on Multivariate High-Frequency Time Series Public

Downloadable Content

open in viewer

As computing power grows and massive storage technology develops, many multivariate high-frequency time series are produced. Such multi-dimensional high-frequency data are usually large and noisy and detecting patterns of the underlying structure becomes a challenging task. The purpose of this dissertation is to develop anomaly detection and deep kernel canonical correlation analysis method on static contiguous time series, and clustering, biclustering methods on dynamic isolated time periods. First, a static anomaly detection method is proposed to study time series corrupted by outliers. The proposed method is similar in spirit to Robust Principal Component Analysis (RPCA) and splits the sample covariance matrix M into two parts, M = F + S, where F is the cleaned sample covariance whose inverse is sparse and computable by Graphical Lasso, and S contains the outliers in M. We accomplish this decomposition by adding an additional L1 penalty to classic Graphical Lasso. Second, a static non-linear deep kernel canonical correlation analysis method is proposed to study the correlation between two time series data set. The proposed method works by optimizing over many parameters encoded in a combination of a radial basis function kernel and a deep neural network. This novel structure simultaneously provides a large number of parameters, through the deep neural network, and an infinite-dimensional feature space, by way of the radial basis function kernel and Mercer’s theorem. Third, dynamic multiple day time series biclustering algorithms are proposed to explore the microstructure of variability of stock prices on transaction-level intra-day data and to dynamically study patterns of comovement over multiple trading days. We first develop a novel multiple day time series biclustering algorithm in the linear metric via mean square error scores and further extend the linear algorithms to non-linear biclustering algorithms via a kernel density estimator (KDE) based jackknife version mutual information (JMI). The proposed methods not only preserve contiguity of the time sequence but also dynamically determine the length of such time intervals. Moreover, We effectively estimate the comovement probability of each m-tuple of stocks conditional on the other stocks within the dynamic biclusters.

Last modified
  • 02/01/2021
  • etd-4086
Defense date
  • 2020
Date created
  • 2020-08-09
Resource type
Rights statement


In Collection:


Permanent link to this page: