More practical differentially private publication of key statistics in GWAS
Analyses of datasets that contain personal genomic information are very important for revealing associations between diseases and genomes. Genome-wide association studies, which are large-scale genetic statistical analyses, often involve tests with contingency tables. However, if the statistics obta...
Saved in:
Published in: | Bioinformatics advances Vol. 1; no. 1; p. vbab004 |
---|---|
Main Authors: | , |
Format: | Journal Article |
Language: | English |
Published: |
England
Oxford University Press
2021
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Abstract | Analyses of datasets that contain personal genomic information are very important for revealing associations between diseases and genomes. Genome-wide association studies, which are large-scale genetic statistical analyses, often involve tests with contingency tables. However, if the statistics obtained by these tests are made public as they are, sensitive information of individuals could be leaked. Existing studies have proposed privacy-preserving methods for statistics in the χ
test with a 3 × 2 contingency table, but they do not cover all the tests used in association studies. In addition, existing methods for releasing differentially private
-values are not practical.
In this work, we propose methods for releasing statistics in the χ
test, the Fisher's exact test and the Cochran-Armitage's trend test while preserving both personal privacy and utility. Our methods for releasing
-values are the first to achieve practicality under the concept of differential privacy by considering their base 10 logarithms. We make theoretical guarantees by showing the sensitivity of the above statistics. From our experimental results, we evaluate the utility of the proposed methods and show appropriate thresholds with high accuracy for using the private statistics in actual tests.
A python implementation of our experiments is available at https://github.com/ay0408/DP-statistics-GWAS.
Supplementary data are available at
online. |
---|---|
AbstractList | Motivation: Analyses of datasets that contain personal genomic information are very important for revealing associations between diseases and genomes. Genome-wide association studies, which are large-scale genetic statistical analyses, often involve tests with contingency tables. However, if the statistics obtained by these tests are made public as they are, sensitive information of individuals could be leaked. Existing studies have proposed privacy-preserving methods for statistics in the χ2 test with a 3 × 2 contingency table, but they do not cover all the tests used in association studies. In addition, existing methods for releasing differentially private P-values are not practical. Results: In this work, we propose methods for releasing statistics in the χ2 test, the Fisher's exact test and the Cochran-Armitage's trend test while preserving both personal privacy and utility. Our methods for releasing P-values are the first to achieve practicality under the concept of differential privacy by considering their base 10 logarithms. We make theoretical guarantees by showing the sensitivity of the above statistics. From our experimental results, we evaluate the utility of the proposed methods and show appropriate thresholds with high accuracy for using the private statistics in actual tests. Availability and implementationA python implementation of our experiments is available at https://github.com/ay0408/DP-statistics-GWAS. Supplementary informationSupplementary data are available at Bioinformatics Advances online. Analyses of datasets that contain personal genomic information are very important for revealing associations between diseases and genomes. Genome-wide association studies, which are large-scale genetic statistical analyses, often involve tests with contingency tables. However, if the statistics obtained by these tests are made public as they are, sensitive information of individuals could be leaked. Existing studies have proposed privacy-preserving methods for statistics in the χ test with a 3 × 2 contingency table, but they do not cover all the tests used in association studies. In addition, existing methods for releasing differentially private -values are not practical. In this work, we propose methods for releasing statistics in the χ test, the Fisher's exact test and the Cochran-Armitage's trend test while preserving both personal privacy and utility. Our methods for releasing -values are the first to achieve practicality under the concept of differential privacy by considering their base 10 logarithms. We make theoretical guarantees by showing the sensitivity of the above statistics. From our experimental results, we evaluate the utility of the proposed methods and show appropriate thresholds with high accuracy for using the private statistics in actual tests. A python implementation of our experiments is available at https://github.com/ay0408/DP-statistics-GWAS. Supplementary data are available at online. |
Author | Yamamoto, Akito Shibuya, Tetsuo |
AuthorAffiliation | Division of Medical Data Informatics, Human Genome Center, The Institute of Medical Science, The University of Tokyo , Tokyo 108-8639, Japan |
AuthorAffiliation_xml | – name: Division of Medical Data Informatics, Human Genome Center, The Institute of Medical Science, The University of Tokyo , Tokyo 108-8639, Japan |
Author_xml | – sequence: 1 givenname: Akito surname: Yamamoto fullname: Yamamoto, Akito organization: Division of Medical Data Informatics, Human Genome Center, The Institute of Medical Science, The University of Tokyo, Tokyo 108-8639, Japan – sequence: 2 givenname: Tetsuo surname: Shibuya fullname: Shibuya, Tetsuo organization: Division of Medical Data Informatics, Human Genome Center, The Institute of Medical Science, The University of Tokyo, Tokyo 108-8639, Japan |
BackLink | https://www.ncbi.nlm.nih.gov/pubmed/36700105$$D View this record in MEDLINE/PubMed |
BookMark | eNpVUU1PAjEUbAxGELl6NHv0stCPpR8XE0IUjRgPajw23W5Xq2Wr7bIJ_94akODh5b1m5k0nb05Br_GNAeAcwTGCgkxK61XVTbpSlRAWR2CAKZnmaUS9g7kPRjF-QAgxYxQV5AT0CWUQIjgdgPsHH0z2FZRurVYuq2xdm2Ca1irnNgmwnWoTYV26hLfWN5mvs0-zyWKbnjFtxcw22eJ19nQGjmvlohnt-hC83Fw_z2_z5ePibj5b5rqY0jbXyclUIy1UTUlVECwUEpAjRjmCFaWYYCQI5pxgWFW8wJBpzikvjCl4IpEhuNrqJlcrU-nkNignk9eVChvplZX_kca-yzffScEQTFdJApc7geC_1ya2cmWjNs6pxvh1lJhRIQRLlajjLVUHH2Mw9f4bBOVvCHIbgtyFkBYuDs3t6X8nJz9HnoYr |
CitedBy_id | crossref_primary_10_1089_cmb_2022_0246 |
Cites_doi | 10.1371/journal.pgen.1000167 10.2307/3001775 10.1145/1653662.1653726 10.1093/bioinformatics/btw613 10.2307/2983604 10.1515/1544-6115.1776 10.1109/TCBB.2018.2854776 10.1126/science.1165490 10.1214/aoms/1177729694 10.1002/gepi.20536 10.1016/j.biopsych.2019.10.015 10.1145/2976749.2978318 10.1093/bib/bbx068 10.1093/bioinformatics/btz837 10.1197/jamia.M3191 10.7555/JBR.29.20140007 10.2202/1544-6115.1325 |
ContentType | Journal Article |
Copyright | The Author(s) 2021. Published by Oxford University Press. The Author(s) 2021. Published by Oxford University Press. 2021 |
Copyright_xml | – notice: The Author(s) 2021. Published by Oxford University Press. – notice: The Author(s) 2021. Published by Oxford University Press. 2021 |
DBID | NPM AAYXX CITATION 7X8 5PM |
DOI | 10.1093/bioadv/vbab004 |
DatabaseName | PubMed CrossRef MEDLINE - Academic PubMed Central (Full Participant titles) |
DatabaseTitle | PubMed CrossRef MEDLINE - Academic |
DatabaseTitleList | MEDLINE - Academic PubMed |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Biology |
EISSN | 2635-0041 |
Editor | Mulder, Nicola |
Editor_xml | – sequence: 1 givenname: Nicola surname: Mulder fullname: Mulder, Nicola |
EndPage | vbab004 |
ExternalDocumentID | 10_1093_bioadv_vbab004 36700105 |
Genre | Journal Article |
GrantInformation_xml | – fundername: ; grantid: JPMJCR1402JST – fundername: ; grantid: 17H01693; 20H05967; 20K21827 |
GroupedDBID | 0R~ AAPXW ABDBF ABXVV ALMA_UNASSIGNED_HOLDINGS GROUPED_DOAJ M~E NPM OK1 ROX RPM TOX ZCN AAYXX ABEJV CITATION 7X8 5PM |
ID | FETCH-LOGICAL-c456t-c0025c1c9af63d4329a1908176810d66232193288320dd84207c88684ee487683 |
IEDL.DBID | RPM |
ISSN | 2635-0041 |
IngestDate | Tue Sep 17 21:30:28 EDT 2024 Fri Aug 16 14:24:48 EDT 2024 Thu Nov 21 23:20:44 EST 2024 Sat Sep 28 08:17:37 EDT 2024 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 1 |
Language | English |
License | The Author(s) 2021. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c456t-c0025c1c9af63d4329a1908176810d66232193288320dd84207c88684ee487683 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
OpenAccessLink | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9710635/ |
PMID | 36700105 |
PQID | 2769997999 |
PQPubID | 23479 |
ParticipantIDs | pubmedcentral_primary_oai_pubmedcentral_nih_gov_9710635 proquest_miscellaneous_2769997999 crossref_primary_10_1093_bioadv_vbab004 pubmed_primary_36700105 |
PublicationCentury | 2000 |
PublicationDate | 2021-00-00 |
PublicationDateYYYYMMDD | 2021-01-01 |
PublicationDate_xml | – year: 2021 text: 2021-00-00 |
PublicationDecade | 2020 |
PublicationPlace | England |
PublicationPlace_xml | – name: England |
PublicationTitle | Bioinformatics advances |
PublicationTitleAlternate | Bioinform Adv |
PublicationYear | 2021 |
Publisher | Oxford University Press |
Publisher_xml | – name: Oxford University Press |
References | Armitage (2022111617202632500_vbab004-B3) 1955; 11 Wang (2022111617202632500_vbab004-B18) 2009 Zaykin (2022111617202632500_vbab004-B21) 2010; 34 Dwork (2022111617202632500_vbab004-B8) 2006 Coleman (2022111617202632500_vbab004-B5) 2020; 88 Abadi (2022111617202632500_vbab004-B1) 2016 Homer (2022111617202632500_vbab004-B11) 2008; 4 Zerhouni (2022111617202632500_vbab004-B23) 2008; 322 Weber (2022111617202632500_vbab004-B19) 2009; 16 Kullback (2022111617202632500_vbab004-B14) 1951; 22 Chen (2022111617202632500_vbab004-B4) 2019; 20 Yates (2022111617202632500_vbab004-B20) 1934; 1 Zhao (2022111617202632500_vbab004-B24) 2017 Dwork (2022111617202632500_vbab004-B7) 2006 Fisher (2022111617202632500_vbab004-B10) 1935 Almadhoun (2022111617202632500_vbab004-B2) 2020; 36 Hsu (2022111617202632500_vbab004-B12) 2014 Fienberg (2022111617202632500_vbab004-B9) 2011 Zeng (2022111617202632500_vbab004-B22) 2015; 29 Kosheleva (2022111617202632500_vbab004-B13) 2017; 46 Spielman (2022111617202632500_vbab004-B17) 1993; 52 Zheng (2022111617202632500_vbab004-B25) 2017; 33 Matthews (2022111617202632500_vbab004-B15) 2008; 7 Raisaro (2022111617202632500_vbab004-B16) 2019; 16 Dickhaus (2022111617202632500_vbab004-B6) 2012; 11 |
References_xml | – volume: 4 start-page: e1000167 year: 2008 ident: 2022111617202632500_vbab004-B11 article-title: Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays publication-title: PLoS Genet doi: 10.1371/journal.pgen.1000167 contributor: fullname: Homer – volume: 11 start-page: 375 year: 1955 ident: 2022111617202632500_vbab004-B3 article-title: Tests for linear trends in proportions and frequencies publication-title: Biometrics doi: 10.2307/3001775 contributor: fullname: Armitage – start-page: 534 year: 2009 ident: 2022111617202632500_vbab004-B18 article-title: Learning your identity and disease from research papers: information leaks in genome wide association study publication-title: CCS '09: Proceedings of the 16th ACM Conference on Computer and Communications Security doi: 10.1145/1653662.1653726 contributor: fullname: Wang – volume: 52 start-page: 506 year: 1993 ident: 2022111617202632500_vbab004-B17 article-title: Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM) publication-title: Am. J. Hum. Genet contributor: fullname: Spielman – volume: 33 start-page: 272 year: 2017 ident: 2022111617202632500_vbab004-B25 article-title: LD Hub: a centralized database and web interface to perform LD score regression that maximizes the potential of summary level GWAS data for SNP heritability and genetic correlation analysis publication-title: Bioinformatics doi: 10.1093/bioinformatics/btw613 contributor: fullname: Zheng – start-page: 4052 volume-title: Automata, Languages and Programming, ICALP 2006, Lecture Notes in Computer Science year: 2006 ident: 2022111617202632500_vbab004-B7 contributor: fullname: Dwork – volume: 1 start-page: 217 year: 1934 ident: 2022111617202632500_vbab004-B20 article-title: Contingency tables involving small numbers and the χ2 test publication-title: Suppl. J. R. Stat. Soc doi: 10.2307/2983604 contributor: fullname: Yates – volume: 11 start-page: doi:10.1515/1544-6115.1776 year: 2012 ident: 2022111617202632500_vbab004-B6 article-title: How to analyze many contingency tables simultaneously in genetic association studies publication-title: Stat. Appl. Genet. Mol. Biol doi: 10.1515/1544-6115.1776 contributor: fullname: Dickhaus – volume: 16 start-page: 1328 year: 2019 ident: 2022111617202632500_vbab004-B16 article-title: MedCo: enabling secure and privacy-preserving exploration of distributed clinical and genomic data publication-title: IEEE/ACM Trans. Comput. Biol. Bioinform doi: 10.1109/TCBB.2018.2854776 contributor: fullname: Raisaro – volume: 322 start-page: 44 year: 2008 ident: 2022111617202632500_vbab004-B23 article-title: Protecting aggregate genomic data publication-title: Science doi: 10.1126/science.1165490 contributor: fullname: Zerhouni – volume: 22 start-page: 79 year: 1951 ident: 2022111617202632500_vbab004-B14 article-title: On information and sufficiency publication-title: Ann. Math. Statist doi: 10.1214/aoms/1177729694 contributor: fullname: Kullback – start-page: 398 year: 2014 ident: 2022111617202632500_vbab004-B12 contributor: fullname: Hsu – volume: 34 start-page: 725 year: 2010 ident: 2022111617202632500_vbab004-B21 article-title: P-value based analysis for shared controls design in genome-wide association studies publication-title: Genet. Epidemiol doi: 10.1002/gepi.20536 contributor: fullname: Zaykin – volume: 88 start-page: 169 year: 2020 ident: 2022111617202632500_vbab004-B5 article-title: The genetics of the mood disorder spectrum: genome-wide association analyses of more than 185,000 cases and 439,000 controls publication-title: Biol. Psychiatry doi: 10.1016/j.biopsych.2019.10.015 contributor: fullname: Coleman – volume: 46 start-page: 102 year: 2017 ident: 2022111617202632500_vbab004-B13 article-title: Why deep learning methods use KL divergence instead of least squares: a possible pedagogical explanation publication-title: Math. Struct. Model contributor: fullname: Kosheleva – start-page: 1 year: 2017 ident: 2022111617202632500_vbab004-B24 article-title: Dependent differential privacy for correlated data contributor: fullname: Zhao – start-page: 308 year: 2016 ident: 2022111617202632500_vbab004-B1 article-title: Deep learning with differential privacy publication-title: CCS '16: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security doi: 10.1145/2976749.2978318 contributor: fullname: Abadi – volume: 20 start-page: 1 year: 2019 ident: 2022111617202632500_vbab004-B4 article-title: OPATs: omnibus p-value association tests publication-title: Brief. Bioinform doi: 10.1093/bib/bbx068 contributor: fullname: Chen – volume: 36 start-page: 1696 year: 2020 ident: 2022111617202632500_vbab004-B2 article-title: Differential privacy under dependent tuples-the case of genomic privacy publication-title: Bioinformatics doi: 10.1093/bioinformatics/btz837 contributor: fullname: Almadhoun – volume: 16 start-page: 624 year: 2009 ident: 2022111617202632500_vbab004-B19 article-title: The Shared Health Research Information Network (SHRINE): a prototype federated query tool for clinical data repositories publication-title: J. Am. Med. Inform. Assoc doi: 10.1197/jamia.M3191 contributor: fullname: Weber – start-page: 628 year: 2011 ident: 2022111617202632500_vbab004-B9 article-title: Privacy preserving GWAS data sharing contributor: fullname: Fienberg – volume: 29 start-page: 285 year: 2015 ident: 2022111617202632500_vbab004-B22 article-title: Statistical analysis for genome-wide association study publication-title: J. Biomed. Res doi: 10.7555/JBR.29.20140007 contributor: fullname: Zeng – start-page: 3876 volume-title: Theory of Cryptography, TCC 2006, Lecture Notes in Computer Science, vol 3876 year: 2006 ident: 2022111617202632500_vbab004-B8 contributor: fullname: Dwork – volume: 7 start-page: doi:10.2202/1544-6115.1325 year: 2008 ident: 2022111617202632500_vbab004-B15 article-title: Collapsing SNP genotypes in case-control genome-wide association studies increases the type I error rate and power publication-title: Stat. Appl. Genet. Mol. Biol doi: 10.2202/1544-6115.1325 contributor: fullname: Matthews – volume-title: The Design of Experiments year: 1935 ident: 2022111617202632500_vbab004-B10 contributor: fullname: Fisher |
SSID | ssj0002776143 |
Score | 2.2201872 |
Snippet | Analyses of datasets that contain personal genomic information are very important for revealing associations between diseases and genomes. Genome-wide... Motivation: Analyses of datasets that contain personal genomic information are very important for revealing associations between diseases and genomes.... |
SourceID | pubmedcentral proquest crossref pubmed |
SourceType | Open Access Repository Aggregation Database Index Database |
StartPage | vbab004 |
SubjectTerms | Original |
Title | More practical differentially private publication of key statistics in GWAS |
URI | https://www.ncbi.nlm.nih.gov/pubmed/36700105 https://search.proquest.com/docview/2769997999 https://pubmed.ncbi.nlm.nih.gov/PMC9710635 |
Volume | 1 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LS8RADA6uIHgR364vKgieuu08nJkeRV0FUQQVvZXpzBRXdttFXcF_b6aP1dWbh54mLSUk5Msk-QJwaClmaAlPwtjGecgF42HGqAsps0LRTEpJ_HDy5Z28eVJn554m57idhama9k026BXDUa8YPFe9leORido-sej2-jTBsIiBMupAB7HhjxT9paqkYWbO2ZSgkUXZoNT2I_rItLfR2QD0B1X-bo78EW36y7DUwMTgpP6dFZhzxSos1IsjP9fg6rp8dUEz4YRy7ZoTdNfh8BMP_M4yFPi-kwvKPEB_DfwAUc3NHAyK4OLx5G4dHvrn96eXYbMWITSIdt5D43GKISbRuWCWM5pojOqKSE8tZgXiGepRmUJfja1VnMbSKCUUdw6zE6HYBswXZeG2ICAI1oQRSW6d4MQSbRKirCYkI1QLIbtw1OoqHdfsF2ldtWZprdW00WoXDlpVpmigvuqgC1dO3lIqBYJQiU8XNmvVTr_l2eP8is4uyBmlTwU8-fXsCdpERYLd2MD2v9_cgUXq-1Oq65RdmH9_nbg96LzZyX6VmO9XZvUFckjU3g |
link.rule.ids | 230,315,729,782,786,866,887,4029,27933,27934,27935,53802,53804 |
linkProvider | National Library of Medicine |
linkToHtml | http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3JTsMwEB1BEYIL-1LWICFxContYDtHVJaiUoQECG5RYruiqE2qQivx94yzFAo3DjnZjqI8j-aNPfMG4FhTjNDCIHR97XfcgLPATRg1LmWaS5oIIYgtTm4-iLsXeXFpZXLOqlqYPGlfJd3TtNc_TbuveW7loK-8Kk_Mu283QnSL6Ci9WZhDe_XpjyD9Lb9Lw9g8YBOJRuYl3SzWY2-cxHaXTrugP7zyd3rkD39ztfzPL12BpZJgOufF8CrMmHQN5ouWk5_r0GpnQ-OUtVE4r2qQgobe633igO12hhO-T_OcrOOgpTu29KhQdXa6qXP9fP6wAU9Xl4-Npls2VHAV8qQPV1mGo4gK4w5nOmA0jJEPSCKsKJnmyISo5XMSrdzXWgbUF0pKLgNjMK7hkm1CLc1Ssw0OQZrHFQ872vCAaBKrkEgdE5IQGnMu6nBS_eNoUOhmRMV9N4sKNKISjTocVRBEuLXtfUWcmmz0HlHBkb4KfOqwVUAyeZfVnbPNPesgpsCaTLCy2dMjiFEun11isvPvlYew0Hxs30a3N3etXVikNsslP5TZg9rHcGT2YfZdjw7yTfkFcMnpcg |
linkToPdf | http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LS8QwEB58oHjx_VifFQRPtc2DJD2Kuio-EFT0VtokxZXddnF3Bf-9k7a7unrTQ0-ZlNIvw3yTTL4BODAUM7SIR35owszngnE_ZdT6lBmhaCqlJO5y8sW9vH1Wp2dOJmfU6qss2tdp6yhvd47y1ktZW9nt6GBYJxbc3ZxEGBYxUAZdkwWTMI0-G_JvifpreZ6G-TlnI5lGFqStIjHvwXuauJU6HoZ-ccufJZLfYk5z4R9fuwjzNdH0jiuTJZiw-TLMVK0nP1bg6qZ4s159Rwrtho1S0OHb7Q8ccF3P0OBrV88rMg893nNXkCp1Z6-Ve-dPx_er8Ng8ezi58OvGCr5GvtT3tWM6mugoyQQznNEoQV6giHTiZEYgI6KO1yn09tAYxWkotVJCcWsxvxGKrcFUXuR2AzyCdE9oEWXGCk4MSXRElEkISQlNhJANOBz-57hb6WfE1bk3iytE4hqRBuwPYYhxibtziyS3xaAXUymQxkp8GrBewTJ6l9Ofc00-GyDHABsZOPns8RHEqZTRrnHZ_PPMPZi9O23G15e3V1swR12xS7k3sw1T_beB3YHJnhnsluvyE8_C6_I |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=More+practical+differentially+private+publication+of+key+statistics+in+GWAS&rft.jtitle=Bioinformatics+advances&rft.au=Yamamoto%2C+Akito&rft.au=Shibuya%2C+Tetsuo&rft.date=2021&rft.eissn=2635-0041&rft.volume=1&rft.issue=1&rft.spage=vbab004&rft.epage=vbab004&rft_id=info:doi/10.1093%2Fbioadv%2Fvbab004&rft.externalDBID=NO_FULL_TEXT |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2635-0041&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2635-0041&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2635-0041&client=summon |