Mostly Order Preserving Dictionaries

Dictionary encoding, or domain encoding, is an important form of compression that uses a bijective mapping to replace attributes from a large domain (i.e. strings) with a finite domain (i.e. 32 bit integers). This encoding both reduces data storage and allows for more efficient query execution. Trad...

Full description

Saved in:
Bibliographic Details
Published in:2019 IEEE 35th International Conference on Data Engineering (ICDE) pp. 1214 - 1225
Main Authors: Liu, Chunwei, Umbenhower, McKade, Jiang, Hao, Subramaniam, Pranav, Ma, Jihong, Elmore, Aaron J.
Format: Conference Proceeding
Language:English
Published: IEEE 01-04-2019
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Dictionary encoding, or domain encoding, is an important form of compression that uses a bijective mapping to replace attributes from a large domain (i.e. strings) with a finite domain (i.e. 32 bit integers). This encoding both reduces data storage and allows for more efficient query execution. Traditional dictionary encoding only supports efficient equality queries, while range queries require that encoded values are decoded for evaluating the predicates. An order preserving dictionary allows for range queries without decoding by ensuring that encoded keys follow the same order as the values in the dictionary. While this approach enables efficient queries it requires that the full set of values is known to create the mappings. In this work we bridge this gap by introducing mostly ordered dictionaries that use a best effort dictionary generation based on sampling the input dataset. Query evaluation on a mostly ordered dictionary avoids decoding when possible and gracefully degrades performance as the ratio of ordered values decreases.
ISSN:2375-026X
DOI:10.1109/ICDE.2019.00111