Faculty Candidate: Ce Zhang
480 Dreese Labs
2015 Neil Avenue
Columbus, Ohio 43210
DeepDive: A Data Management System for Machine Learning Workloads
Many pressing questions in science are macroscopic: they require scientists to consult information expressed in a wide range of resources, many of which are not organized in a structured relational form. Knowledge base construction (KBC) is the process of populating a knowledge base, i.e., a relational database storing factual information, from unstructured inputs. KBC holds the promise of facilitating a range of macroscopic sciences by making information accessible to scientists. One key challenge in building a high-quality KBC system is that developers must often deal with data that are both diverse in type and large in size. Further complicating the scenario is that these data need to be manipulated by both relational operations and state-of-the-art machine-learning techniques.
My research focuses on building a data management system for machine learning workloads with the goal to help this complex process of building KBC systems. The system I build is called DeepDive, whose ultimate goal is to allow scientists to build a KBC system, and machine learning systems in general, by declaratively specifying domain knowledge without worrying about any algorithmic, performance, or scalability issues. DeepDive has been used by users without machine learning expertise in a number of domains from paleobiology to genomics to anti-human trafficking. In this talk, I will describe the DeepDive framework, its applications, and underlying techniques we developed to speed up a range of machine learning workloads by up to two orders of magnitude.
Ce is a postdoctoral researcher in Computer Science at Stanford University. He is working with Christopher Ré on data management and database systems. With the indispensable help of many collaborators, his PhD work produced the system DeepDive, a trained data system for automatic knowledge-base construction. As part of his PhD thesis, he led the research efforts that won the 2014 SIGMOD Best Paper Award and was invited to the “Best of VLDB 2015” special issue; PaleoDeepDive, a machine-reading system for paleontologists, was featured in Nature magazine, and he also led the Stanford team that produced the top-performing machine-reading system for TAC-KBP 2014 slot-filling evaluations using DeepDive. Ce obtained his PhD from the University of Wisconsin-Madison, advised by Christopher Ré, and his Bachelor of Science degree from Peking University, advised by Bin Cui.