Scalable Integration View Computation and Maintenance with Parallel, Adaptive and Grouping Techniques


Downloadable Content

open in viewer

<P>Materialized integration views constructed by integrating data from multiple distributed data sources help to achieve better access, reliable performance, and high availability for a wide range of applications. In this dissertation, we propose parallel, adaptive, and grouping techniques to address scalability challenges in high-performance integration view computation and maintenance due to increasingly large data sources and high rates of source updates.</P> <P>State-of-the-art parallel integration view computation makes the common assumption that the maximal pipelined parallelism leads to superior performance. We instead propose <I>segmented bushy</I> parallel processing that combines pipelined parallelism with alternate forms of parallelism to achieve an overall more effective strategy. Experimental studies conducted over a cluster of high-performance PCs confirm that the proposed strategy has an on average of 50\% improvement in terms of total processing time in comparison to existing solutions.</P> <P>Run-time adaptation becomes critical for parallel integration view computation due to its long running and memory intensive nature. We investigate two types of state level adaptations, namely, <I>state spill</I> and <I>state relocation</I>, to address the run-time memory shortage. We propose <I>lazy-disk</I> and <I>active-disk</I> approaches that integrate both adaptations to maximize run-time query throughput in a memory constrained environment. We also propose <I>global throughput-oriented</I> state adaptation strategies for computation plans with multiple state intensive operators. Extensive experiments confirm the effectiveness of our proposed adaptation solutions.</P> <P>Once results have been computed and materialized, it's typically more efficient to maintain them incrementally instead of full recomputation. However, state-of-the-art incremental view maintenance require O($n^2$) maintenance queries with <I>n</I> being the number of data sources that the view is defined upon. Moreover, they do not exploit view definitions and data source processing capabilities to further improve view maintenance performance. We propose novel <I>grouping</I> maintenance algorithms that dramatically reduce the number of maintenance queries to (O(n)). A cost-based view maintenance framework has been proposed to generate optimized maintenance plans tuned to particular environmental settings. Extensive experimental studies verify the effectiveness of our maintenance algorithms as well as the maintenance framework.</P>

  • English
  • etd-081905-093754
Defense date
  • 2005
Date created
  • 2005-08-19
Resource type
Rights statement


In Collection:



Permanent link to this page: