Scaling Optimizations for Large-Scale Distributed Data with Lightweight Coresets
Lightweight coresets are compact representations of data sets such that clustering methods present competitive results in relation to the complete data set. They are constructed by sampling important points from the complete set. We propose a fast method to approximate the sampling of lightweight co...
Saved in:
Published in: | 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) pp. 426 - 429 |
---|---|
Main Authors: | , , |
Format: | Conference Proceeding |
Language: | English |
Published: |
IEEE
01-05-2020
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Lightweight coresets are compact representations of data sets such that clustering methods present competitive results in relation to the complete data set. They are constructed by sampling important points from the complete set. We propose a fast method to approximate the sampling of lightweight coresets from very large data sets which are distributed among multiple machines. We show that the proposed method is much faster and scalable, reaching results 48 times faster than the original lightweight coresets, while holding similar properties. |
---|---|
DOI: | 10.1109/IPDPSW50202.2020.00078 |