Towards Federated Learning Approach to Determine Data Relevance in Big Data

In the past few years, data has proliferated to astronomical proportions; as a result, big data has become the driving force behind the growth of many machine learning innovations. However, the incessant generation of data in the information age poses a needle in the haystack problem, where it has b...

Full description

Saved in:
Bibliographic Details
Published in:2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI) pp. 184 - 192
Main Authors: Doku, Ronald, Rawat, Danda B., Liu, Chunmei
Format: Conference Proceeding
Language:English
Published: IEEE 01-07-2019
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In the past few years, data has proliferated to astronomical proportions; as a result, big data has become the driving force behind the growth of many machine learning innovations. However, the incessant generation of data in the information age poses a needle in the haystack problem, where it has become challenging to determine useful data from a heap of irrelevant ones. This has resulted in a quality over quantity issue in data science where a lot of data is being generated, but the majority of it is irrelevant. Furthermore, most of the data and the resources needed to effectively train machine learning models are owned by major tech companies, resulting in a centralization problem. As such, federated learning seeks to transform how machine learning models are trained by adopting a distributed machine learning approach. Another promising technology is the blockchain, whose immutable nature ensures data integrity. By combining the blockchain's trust mechanism and federated learning's ability to disrupt data centralization, we propose an approach that determines relevant data and stores the data in a decentralized manner.
DOI:10.1109/IRI.2019.00039