Handling Uncertain Data in Array Database Systems
Scientific and intelligence applications have special data handling needs. In these settings, data does not fit the standard model of short coded records that had dominated the data management area for three decades. Array database systems have a specialized architecture to address this problem. Sin...
Saved in:
Published in: | 2008 IEEE 24th International Conference on Data Engineering pp. 1140 - 1149 |
---|---|
Main Authors: | , |
Format: | Conference Proceeding |
Language: | English |
Published: |
IEEE
01-04-2008
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Scientific and intelligence applications have special data handling needs. In these settings, data does not fit the standard model of short coded records that had dominated the data management area for three decades. Array database systems have a specialized architecture to address this problem. Since the data is typically an approximation of reality, it is important to be able to handle imprecision and uncertainty in an efficient and provably accurate way. We propose a discrete approach for value distributions and adopt a standard metric (i.e., variation distance) in probability theory to measure the quality of a result distribution. We then propose a novel algorithm that has a provable upper bound on the variation distance between its result distribution and the "ideal" one. Complementary to that, we advocate the usage of a "statistical mode" suitable for the results of many queries and applications, which is also much more efficient for execution. We show how the statistical mode also presents interesting predicate evaluation strategies. In addition, extensive experiments are performed on real world datasets to evaluate our algorithms. |
---|---|
ISBN: | 9781424418367 1424418364 |
ISSN: | 1063-6382 2375-026X |
DOI: | 10.1109/ICDE.2008.4497523 |