Etd

MASS: A Multi-Axis Storage Structure for Large XML Documents

Public

Downloadable Content

open in viewer

Due to the wide acceptance of the Word Wide Web Consortium (W3C) XPath language specification, native indexing for XML is needed to support path expression queries efficiently. XPath describes the different document tree relationships that may be queried as a set of axes. Many recent proposals for XML indexing focus on accelerating only a small subset of expressions possible using these axes. In particular, queries by ordinal position and updates that alter document structure are not well supported. A more general indexing solution is needed that not only offers efficient evaluation of all of the XPath axes, but also allows for efficient document update. We introduce MASS, a Multiple Axis Storage Structure, to meet the performance challenge posed by the XPath language. MASS is a storage and indexing solution for large XML documents that eliminates the need for external secondary storage. It is designed around the XPath language, providing efficient interfaces for evaluating all XPath axes. The clustered organization of MASS allows several different axes to be evaluated using the same index structure. The clustering, in conjunction with an internal compression mechanism exploiting specific XML characteristics, keep the size of the structure small which further aids efficiency. MASS introduces a versatile scheme for representing document node relationships that always allows for efficient updates. Finally, the integration of a ranked B+ tree allows MASS to efficiently evaluate XPath axes in large documents. We have implemented MASS in C++ and measured the performance of many different XPath expressions and document updates. Our experimental evaluation illustrates that MASS exhibits excellent performance characteristics for both queries and updates and scales well to large documents, making it a practical solution for XML storage. In conjunction with text indexing, MASS provides a complete solution from XML indexing.

Creator
Contributors
Degree
Unit
Publisher
Language
  • English
Identifier
  • etd-0506102-113510
Keyword
Advisor
Defense date
Year
  • 2002
Date created
  • 2002-05-06
Resource type
Rights statement

Relations

In Collection:

Items

Items

Permanent link to this page: https://digital.wpi.edu/show/q237hr963