Availability of very large genomic datasets promises a revolution in medicine. However, it has been shown that it is not straightforward to ensure anonymity of the participants in such datasets. Sharing data in a privacy-preserving way stands as a major bottleneck in front of the medical progress. Recently, a community-driven protocol has been widely adopted for sharing genomic data. So called “genomic data-sharing beacon protocol” aims to provide a secure, easy to implement, and standardized interface for data sharing by only allowing yes/no queries on the presence of specific alleles in the dataset. Previously deemed robust against privacy threats, beacon protocol was recently shown to be vulnerable against membership inference attacks despite its stringent policy. Currently, there is no way to systematically assess beacons' privacy risks for neither the genome donors nor the beacon operators. This cast doubts on usability of beacons from both parties' point of views. Setting up a beacon is risky for beacon operators because of repercussions of possible breaches. Furthermore, for the donors who lack technical background to comprehend the risk, it is often safer to opt-out. Thus, a comprehensive understanding of the system's pitfalls and briefing the genome donors and the beacon operators on potential threats are important issues to overcome to move forward. In this proposal, we aim at (i) detecting and analyzing vulnerabilities of the genomic data-sharing beacons, (ii) providing risk quantification tools for both the donors and data owners to inform both parties on possible risks, and (iii) generating countermeasures against these vulnerabilities. We provide extensive preliminary work on possible vulnerabilities of the beacon system and potential countermeasures. For the first time, we will investigate the information leakage due to beacon updates, which will guide beacon admins on when and how to update the content of the beacon. As the second goal, we will design risk quantification algorithms to assess the risk and inform both the genome donors and beacon operators on possible risks of sharing data. This will be the first attempt at helping beacon operators and participants make informed decisions. We project that if this project is realized, beacon system will be transparent in terms of privacy risks, which will reinstate the trustworthiness of the system and increase its usability. This in turn will tear down the borders that stand in the way of sharing genomic data and enable all downstream research that will benefit from larger data sizes. Our final goal is to focus on countermeasures to protect sensitive information. We observe that current approaches fail to protect the privacy of individuals and provide high data utility at the same time. We will implement novel differential privacy and game theory-based techniques to ensure privacy- preserving data sharing with high data utility.
https://reporter.nih.gov/search/d9cOyhqt8Em-54KqvREuaw/project-details/10031275