In today's data-driven world, to receive personalized services or contribute to scientific studies, data owners share sensitive information with a wide-range of service providers (SPs). While doing so, they want to make sure that SPs will comply with the data usage agreements and not engage in unauthorized sharing of their data. Thus, in case of an unauthorized distribution of their data, data owners want to detect (and identify) the source of such data leakages to keep the corresponding SP(s) liable. Digital fingerprinting is a technique to identify the recipient of a digital object by embedding a unique mark (called fingerprint) into the shared object, with the aim to identify the guilty SP who is responsible for data leakage. However, existing fingerprinting techniques are not directly applicable for sharing sensitive, correlated, and high value (in terms of utility) data because (i) they (especially for multimedia data) utilize the high redundancy in the data, (ii) the embedded marks need to be large to provide robustness against attacks, which reduces the utility of shared data, and (iii) they do not consider the correlations between data points, which reduces the robustness of the fingerprint. Such unique challenges for fingerprinting correlated data require fundamentally new ways to design fingerprinting algorithms that also provide high data utility and data privacy.
In this research, the investigators propose novel techniques for robust, privacy- and utility-preserving fingerprinting of correlated data. First, the vulnerability of existing fingerprinting schemes to the attacks will be shown by exploiting the correlations in the data. To mitigate the identified vulnerabilities, new probabilistic fingerprinting algorithms that provide robustness against a wide-variety of attacks will be developed. Furthermore, realizing the similarities between the proposed fingerprinting algorithms and privacy-preserving data sharing, for the first time, the proposed techniques will provide both privacy and robust fingerprinting while sharing data. Specifically, the proposed research thrusts include: (i) in-depth study of the proposed probabilistic fingerprinting algorithms, including formal robustness analysis, studying different correlation models, and improving utility considering different utility definitions; (ii) application of the proposed fingerprinting schemes for different data types, such as personal correlated data, databases, and graphs; (iii) developing data sharing metrics and algorithms that provide privacy along with robust fingerprinting by exploring differential privacy and its variants; and (iv) developing algorithms to find the optimal order of data processing that simultaneously optimize fingerprint robustness, privacy, and utility. In a broader view, the investigators expect the impact of the proposed research to be significant in several areas: (i) on society, by providing tools that identify the sources of unauthorized data leakages with high probability. This will deter malicious SPs from unauthorized sharing of their users’ data. Furthermore, data owners, knowing they have stronger control on how their data will be used and shared, will be more willing to share their data with the SPs; (ii) on education and learning, by training graduate, undergraduate, and high school students; and (iii) on broadening participation of underrepresented groups in computing, by recruitment of women and underrepresented groups in this project.
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2050410&HistoricalAwards=false