Reptile: Aggregation-level Explanations for Hierarchical Data
Recent query explanation systems help users understand anomalies in aggregation results by proposing predicates that describe input records that, if deleted, would resolve the anomalies. However, it can be difficult for users to understand how a predicate was chosen, and these approaches are limited...
Saved in:
Main Authors: | , |
---|---|
Format: | Journal Article |
Language: | English |
Published: |
11-03-2021
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Recent query explanation systems help users understand anomalies in
aggregation results by proposing predicates that describe input records that,
if deleted, would resolve the anomalies. However, it can be difficult for users
to understand how a predicate was chosen, and these approaches are limited to
errors that can be resolved through deletion. In contrast, data errors may be
due to group-wise errors, such as missing records or systematic value errors.
This paper presents Reptile, an explanation system for hierarchical data. Given
an anomalous aggregate query result, Reptile recommends the next drill-down
attribute,and ranks the drill-down groups based on the extent repairing the
group's statistics to its expected values resolves the anomaly. Reptile
efficiently trains a multi-level model that leverages the data's hierarchy to
estimate the expected values, and uses a factorised representation of the
feature matrix to remove redundancies due to the data's hierarchical structure.
We further extend model training to support factorised data, and develop a
suite of optimizations that leverage the data's hierarchical structure. Reptile
reduces end-to-end runtimes by more than 6 times compared to a Matlab-based
implementation, correctly identifies 21/30 data errors in John Hopkin's
COVID-19 data, and correctly resolves 20/22 complaints in a user study using
data and researchers from Columbia University's Financial Instruments Sector
Team. |
---|---|
DOI: | 10.48550/arxiv.2103.07037 |