Scaling Optimizations for Large-Scale Distributed Data with Lightweight Coresets

Lightweight coresets are compact representations of data sets such that clustering methods present competitive results in relation to the complete data set. They are constructed by sampling important points from the complete set. We propose a fast method to approximate the sampling of lightweight co...

Full description

Saved in:
Bibliographic Details
Published in:2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) pp. 426 - 429
Main Authors: Pinheiro, Daniel N., Xavier-de-Souza, Samuel, Aloise, Daniel
Format: Conference Proceeding
Language:English
Published: IEEE 01-05-2020
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Lightweight coresets are compact representations of data sets such that clustering methods present competitive results in relation to the complete data set. They are constructed by sampling important points from the complete set. We propose a fast method to approximate the sampling of lightweight coresets from very large data sets which are distributed among multiple machines. We show that the proposed method is much faster and scalable, reaching results 48 times faster than the original lightweight coresets, while holding similar properties.
DOI:10.1109/IPDPSW50202.2020.00078