mathsf : Privacy-Preserving Integration and Sharing of Datasets

In privacy-enhancing technology, it has been inevitably challenging to strike a reasonable balance between privacy, efficiency, and usability (utility). To this, we propose a highly practical solution for the privacy-preserving integration and sharing of datasets among a group of participants. At th...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on information forensics and security Vol. 15; pp. 564 - 577
Main Authors: Lim, Hoon Wei, Poh, Geong Sen, Xu, Jia, Chittawar, Varsha
Format: Journal Article
Language:English
Published: IEEE 2020
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In privacy-enhancing technology, it has been inevitably challenging to strike a reasonable balance between privacy, efficiency, and usability (utility). To this, we propose a highly practical solution for the privacy-preserving integration and sharing of datasets among a group of participants. At the heart of our solution is a new interactive protocol, <inline-formula> <tex-math notation="LaTeX">\mathsf{PrivateLink} </tex-math></inline-formula>. Through <inline-formula> <tex-math notation="LaTeX">\mathsf{PrivateLink} </tex-math></inline-formula>, each participant is able to randomize his/her dataset via an independent and untrusted third party, such that the resulting dataset can be merged with other randomized datasets contributed by other participants in a privacy-preserving manner. Our approach does not require key sharing among participants in order to integrate different datasets. This, in turn, leads to a user-friendly and scalable solution. Moreover, the correctness of a randomized dataset returned by the third party can be securely verified by the participant. We further demonstrate <inline-formula> <tex-math notation="LaTeX">\mathsf{PrivateLink} </tex-math></inline-formula>'s general utilities: using it to construct a structure-preserving data integration protocol. This is particularly useful for private, fine-grained integration of network traffic data. We state the security of our protocols under the well-established real-ideal simulation paradigm and demonstrate practicality by a prototype implementation on: 1) healthcare datasets and 2) DNS and NetFlow datasets.
ISSN:1556-6013
1556-6021
DOI:10.1109/TIFS.2019.2924201