Durable Top-K Instant-Stamped Temporal Records with User-Specified Scoring Functions
A way of finding interesting or exceptional records from instant-stamped temporal data is to consider their "durability," or, intuitively speaking, how well they compare with other records that arrived earlier or later, and how long they retain their supremacy. For example, people are natu...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Journal Article |
Language: | English |
Published: |
24-02-2021
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | A way of finding interesting or exceptional records from instant-stamped
temporal data is to consider their "durability," or, intuitively speaking, how
well they compare with other records that arrived earlier or later, and how
long they retain their supremacy. For example, people are naturally fascinated
by claims with long durability, such as: "On January 22, 2006, Kobe Bryant
dropped 81 points against Toronto Raptors. Since then, this scoring record has
yet to be broken." In general, given a sequence of instant-stamped records,
suppose that we can rank them by a user-specified scoring function $f$, which
may consider multiple attributes of a record to compute a single score for
ranking. This paper studies "durable top-$k$ queries", which find records whose
scores were within top-$k$ among those records within a "durability window" of
given length, e.g., a 10-year window starting/ending at the timestamp of the
record. The parameter $k$, the length of the durability window, and parameters
of the scoring function (which capture user preference) can all be given at the
query time. We illustrate why this problem formulation yields more meaningful
answers in some practical situations than other similar types of queries
considered previously. We propose new algorithms for solving this problem, and
provide a comprehensive theoretical analysis on the complexities of the problem
itself and of our algorithms. Our algorithms vastly outperform various
baselines (by up to two orders of magnitude on real and synthetic datasets). |
---|---|
DOI: | 10.48550/arxiv.2102.12072 |