Substantial progress has been achieved towards reducing the cost of DNA sequencing, resulting in generation and availability of large amounts of genomic data. Public availability of this data and sharing of genomic databases between institutions are vital to paving the way towards personalized medicine. However, there are significant risks in sharing genomic data, which carries a lot of sensitive information about its owner, such as his/her disease predispositions (e.g., for Alzheimer's), ancestry, and physical attributes. This threat to genomic privacy is magnified by the fact that a person's genome is correlated to his/her family members' genomes, thus leading to interdependent privacy risks.
In this project, the main research objective is to develop privacy-preserving techniques for sharing genomic databases (and statistics about genomic databases) under diverse settings that include the following: (i) when a database owner (researcher) shares statistics about its database; (ii) when a database owner shares its entire database (e.g., after data use agreements) with a client; and (iii) when database owners outsource the storage and processing of their databases to a third-party cloud server. The project will also address important and challenging issues, such as, verification of shared statistics by a client, liability of shared databases in case of their unauthorized sharing, and maximizing the utility of shared data. For all developed techniques, the project team will evaluate/quantify privacy using state-of-the-art genomic privacy quantification algorithms.
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2141622&HistoricalAwar…