Faculty Search Candidate: Thomas Steinke
480 Dreese Labs
2015 Neil Ave, Columbus, Ohio 43210
Protecting Privacy and Guaranteeing Generalization with Algorithmic Stability
As data is being more widely collected and used, privacy and statistical validity are becoming increasingly difficult to protect. Sound solutions are needed, as "ad hoc" approaches have resulted in several high-profile failures.
In this talk, I will illustrate how privacy can be unwittingly compromised -- i.e., sensitive information can be leaked by seemingly innocuous "anonymized" or aggregate data. I will then show how to avoid these pitfalls using the framework of differential privacy. Differential privacy is an information-theoretic measure of algorithmic stability that translates into a robust privacy guarantee and which also permits us to design algorithms to perform sophisticated statistical analyses.
Privacy turns out to be intimately related to generalization in machine learning. In particular, a differentially private algorithm is guaranteed to not "overfit" its data, meaning that any statistical conclusions extend to the underlying distribution from which the data was drawn. I will discuss this connection and explain how it is especially useful for adaptive data analysis, namely when one dataset is used over and over again and each successive analysis is informed by the outcome of previous analyses.
Bio: Thomas Steinke is a postdoctoral researcher at the IBM Almaden Research Center in San Jose, California. In 2016, he graduated from Harvard University with a PhD in Computer Science advised by Salil Vadhan and prior to that he completed a MSc and a BSc(Hons) in New Zealand. His research interests include providing rigorous tools for privacy-preserving data analysis and statistically valid adaptive data analysis, as well as pseudorandomness.