Faculty Candidate: Raef Bassily
480 Dreese Labs
2015 Neil Avenue
Columbus, Ohio 43210
Learning from Private Data without Learning Private Data
The dissemination of increasing volumes of personal and sensitive data has become ubiquitous and continuous. Making such data widely available for statistical analysis and machine learning can provide a broad range of benefits. However, the results of analyses of private data can lead to devastating disclosure of sensitive information. We face two seemingly conflicting goals: gaining the benefits of machine learning based on private data, and protecting the privacy of the individuals whose data is collected. How can we achieve them both? Straightforward approaches to deal with the privacy problem such as data anonymization are at best unreliable: the last decade has seen a string of attacks that recover personal information from supposedly "anonymized" data.
Based on my work, I will show how to efficiently achieve both goals in a rigorous and provable manner. I will describe efficient algorithms, with optimal accuracy and rigorous privacy guarantees, for solving a broad range of machine learning problems. The privacy guarantees of these algorithms are in the form of differential privacy: a rigorous notion of statistical data privacy. These algorithms cover various applications in both models of private data analysis: the centralized model (with trusted curator) and the distributed model (with untrusted curator). Some of these algorithms have been implemented and deployed in industry. Part of my research efforts have been also devoted to understanding and revealing the connections between the notions of privacy and learning. My work characterizes the role of privacy — as a strong stability condition — in preventing overfitting even when data is analyzed adaptively. I will highlight this fundamental connection between privacy and machine learning, which shows a surprising harmony between these seemingly opposite notions.
Bio: Raef Bassily is currently a Data Science Postdoctoral Fellow in the Department of Computer Science & Engineering and the Center of Information Theory and Applications (ITA) at the University of California, San Diego. Prior to this, he was a postdoctoral scholar in the Department of Computer Science & Engineering at the Pennsylvania State University. His current research focuses on developing practical algorithms for privacy-preserving machine learning and data analysis. His distributed protocols for histograms estimation have been recently deployed in the latest version of Apple’s iOS to enable private crowdsourcing from Apple users. His earlier research focused on developing coding schemes and communication protocols to ensure information theoretic security in communication networks. He received his Ph.D. in Electrical and Computer Engineering from the University of Maryland, College Park, in 2012.
Host: Tasos Sidiropoulos