Durable Top-K Instant-Stamped Temporal Records with User-Specified Scoring Functions

A way of finding interesting or exceptional records from instant-stamped temporal data is to consider their "durability," or, intuitively speaking, how well they compare with other records that arrived earlier or later, and how long they retain their supremacy. For example, people are natu...

Full description

Saved in:

Bibliographic Details
Main Authors:	Gao, Junyang, Sintos, Stavros, Agarwal, Pankaj K, Yang, Jun
Format:	Journal Article
Language:	English
Published:	24-02-2021
Subjects:	Computer Science - Databases
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	A way of finding interesting or exceptional records from instant-stamped temporal data is to consider their "durability," or, intuitively speaking, how well they compare with other records that arrived earlier or later, and how long they retain their supremacy. For example, people are naturally fascinated by claims with long durability, such as: "On January 22, 2006, Kobe Bryant dropped 81 points against Toronto Raptors. Since then, this scoring record has yet to be broken." In general, given a sequence of instant-stamped records, suppose that we can rank them by a user-specified scoring function $f$, which may consider multiple attributes of a record to compute a single score for ranking. This paper studies "durable top-$k$ queries", which find records whose scores were within top-$k$ among those records within a "durability window" of given length, e.g., a 10-year window starting/ending at the timestamp of the record. The parameter $k$, the length of the durability window, and parameters of the scoring function (which capture user preference) can all be given at the query time. We illustrate why this problem formulation yields more meaningful answers in some practical situations than other similar types of queries considered previously. We propose new algorithms for solving this problem, and provide a comprehensive theoretical analysis on the complexities of the problem itself and of our algorithms. Our algorithms vastly outperform various baselines (by up to two orders of magnitude on real and synthetic datasets).
DOI:	10.48550/arxiv.2102.12072