Targeted Prioritized Processing in Overloaded Data Stream Systems

Works, Karen E.

Etd

Targeted Prioritized Processing in Overloaded Data Stream Systems

Public

We are in an era of big data, sensors, and monitoring technology. One consequence of this technology is the continuous generation of massive volumes of streaming data. To support this, stream processing systems have emerged. These systems must produce results while meeting near-real time response obligations. However, computation intensive processing on high velocity streams is challenging. Stream arrival rates are often unpredictable and can fluctuate. This can cause systems to not always be able to process all incoming data within their required response time.Yet inherently some results may be much more significant than others. The delay or complete neglect of producing certain highly significant results could result in catastrophic consequences. Unfortunately, this critical problem of targeted prioritized processing in overloaded environments remains largely unaddressed to date. In this talk, I will describe four key challenges that my dissertation successfully tackled. First, I address the problem of optimally processing the most significant tuples identified by the user at compile-time before less critical ones. Second, I propose a new aggregate operator that increases the accuracy of aggregate results produced for TP systems. Third, I address the problem of identifying and pulling forward significant tuples at run-time via dynamic determinants. Fourth, I design multi-input operators, such as the join operator, which produce multi-stream results in significance order. My experimental studies explore a rich diversity of workloads, queries, and data sets, including real data streams. The results substantiate that my approaches are a significant improvement over the state-of-the-art approaches.

Creator