Privacy-Preserving Data Sharing for Genome-Wide Association Studies
Traditional statistical methods for confidentiality protection of statistical databases do not scale well to deal with GWAS (genome-wide association studies) databases especially in terms of guarantees regarding protection from linkage to external information. The more recent concept of differential...
Saved in:
Main Authors: | , , |
---|---|
Format: | Journal Article |
Language: | English |
Published: |
03-05-2012
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Traditional statistical methods for confidentiality protection of statistical
databases do not scale well to deal with GWAS (genome-wide association studies)
databases especially in terms of guarantees regarding protection from linkage
to external information. The more recent concept of differential privacy,
introduced by the cryptographic community, is an approach which provides a
rigorous definition of privacy with meaningful privacy guarantees in the
presence of arbitrary external information, although the guarantees come at a
serious price in terms of data utility. Building on such notions, we propose
new methods to release aggregate GWAS data without compromising an individual's
privacy. We present methods for releasing differentially private minor allele
frequencies, chi-square statistics and p-values. We compare these approaches on
simulated data and on a GWAS study of canine hair length involving 685 dogs. We
also propose a privacy-preserving method for finding genome-wide associations
based on a differentially-private approach to penalized logistic regression. |
---|---|
DOI: | 10.48550/arxiv.1205.0739 |