Handling Uncertain Data in Array Database Systems

Scientific and intelligence applications have special data handling needs. In these settings, data does not fit the standard model of short coded records that had dominated the data management area for three decades. Array database systems have a specialized architecture to address this problem. Sin...

Full description

Saved in:
Bibliographic Details
Published in:2008 IEEE 24th International Conference on Data Engineering pp. 1140 - 1149
Main Authors: Tingjian Ge, Zdonik, S.
Format: Conference Proceeding
Language:English
Published: IEEE 01-04-2008
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Scientific and intelligence applications have special data handling needs. In these settings, data does not fit the standard model of short coded records that had dominated the data management area for three decades. Array database systems have a specialized architecture to address this problem. Since the data is typically an approximation of reality, it is important to be able to handle imprecision and uncertainty in an efficient and provably accurate way. We propose a discrete approach for value distributions and adopt a standard metric (i.e., variation distance) in probability theory to measure the quality of a result distribution. We then propose a novel algorithm that has a provable upper bound on the variation distance between its result distribution and the "ideal" one. Complementary to that, we advocate the usage of a "statistical mode" suitable for the results of many queries and applications, which is also much more efficient for execution. We show how the statistical mode also presents interesting predicate evaluation strategies. In addition, extensive experiments are performed on real world datasets to evaluate our algorithms.
ISBN:9781424418367
1424418364
ISSN:1063-6382
2375-026X
DOI:10.1109/ICDE.2008.4497523