Pulse: Database support for efficient query processing of temporal polynomial models
This thesis investigates the practicality and utility of mathematical models to represent continuous and occasionally unavailable data stream attributes, and processing relational-style queries in a stream processing engine directly on these models. We present Pulse, a framework for processing conti...
Saved in:
Main Author: | |
---|---|
Format: | Dissertation |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | This thesis investigates the practicality and utility of mathematical models to represent continuous and occasionally unavailable data stream attributes, and processing relational-style queries in a stream processing engine directly on these models. We present Pulse, a framework for processing continuous queries over stream attributes modeled as piecewise polynomial functions. We use piecewise polynomials to provide a compact, approximate representation of the input dataset and provide query language extensions for users to specify precision bounds to control this approximation. Pulse represents queries as simultaneous equation systems for a variety of relational operators including filters, joins and standard aggregates. In the stream context, we continually solve these equation systems as new data arrives into the system. We have implemented Pulse on top of the Borealis stream processing engine and evaluated it on two real-world datasets from financial and moving object applications. Pulse is able to achieve significant performance improvements by processing queries directly on the mathematical representation of these polynomials, in comparison to standard tuple-based stream processing, thereby demonstrating the viability of our system in the face of having to meet precision requirements.
In addition to our primary contribution of describing the core design and architecture of Pulse, this thesis presents a selectivity estimator and a multi-query optimizer to scale query processing capabilities. Our selectivity estimator uses histograms defined on a parameter space of polynomial coefficients for estimation, passing selectivities to our multi-query optimizer which may then determine how to construct a global query plan that shares work across individual queries. We evaluate these components on both a synthetic dataset and a financial dataset. Our experiments show that our optimization mechanisms provide significant reductions in processing overhead, and that our estimation algorithm provides an accurate and low overhead estimator for selective operators, that can be enhanced by sampling, while also being a general technique that can handle operators such as min and max aggregates, where sampling is known to be inaccurate. |
---|---|
Bibliography: | Source: Dissertation Abstracts International, Volume: 70-10, Section: B, page: 6331. Adviser: Ugur Cetintemel. |
ISBN: | 1109424108 9781109424102 |