2022 SIGMOD Research Highlight Award and 2021 BIBM Best Paper Award

Posted: June 16, 2022

CSE is very delighted to announce that Prof. Huan Sun’s paper, titled “TURL: Table Understanding through Representation Learning” and originally published in VLDB 2021, has won the 2022 ACM SIGMOD Research Highlight Award. This paper was co-authored with Xiang Deng (currently a 4th-year Ph.D. candidate advised by Prof. Sun), and collaborators from Google Research, Dr. Alyssa Lees, Dr. You Wu and Dr. Cong Yu. 

Another paper of hers titled with “CliniQG4QA: Generating Diverse Questions for Domain Adaptation of Clinical Question Answering” won the Best Paper Award at the IEEE International Conference on Bioinformatics and Biomedicine (BIBM) 2021. This paper was co-authored with Xiang Yue (currently a 4th-year Ph.D. student advised by Prof. Sun), Xinliang Frederick Zhang (then an undergraduate in CSE and now a Ph.D. student at the University of Michigan), Ziyu Yao (then a Ph.D. student in CSE and now an assistant professor at George Mason University), Simon Lin (then the Chief Research Information Officer at the Nationwide Children's Hospital and now at the University of California, Riverside).

Here is a brief introduction of the two papers: 

1. TURL [VLDB’21] addresses an important problem of understanding semi-structured tabular data with representations learned via neural models. Tabular data is in abundance on the Web and databases. Owing to the wealth and utility of these data, there has been growing interest in a variety of tasks in the area of table understanding. Users often query over tables from multiple sources to find the information they need, and compose their own tables to organize the information. However, existing work generally relies on heavily-engineered task-specific features and model architecture, which not only require huge manual efforts to design, but also hard to generalize well across tasks. In this paper, the authors introduce the pre-training/fine-tuning paradigm to relational tables, which is inspired by the success of large pre-trained language models for natural language processing. TURL first learns deep contextualized representations on relational tables by self-supervised pre-training, the pre-trained representations can then be applied to a wide range of tasks with minimal task-specific fine-tuning. This study makes a great example of how to extend representation techniques designed for unstructured text to structured data such as tables.

The SIGMOD Research Highlight Award is a highly selective and prestigious award for  the database community to showcase a set of research projects that exemplify core database research. In particular, each of these projects addresses an important problem, represents a definitive milestone in solving the problem, and has the potential of significant impact. This award also aims to make the selected works widely known in the database community, to our industry partners, and to the broader ACM community.

The ACM Special Interest Group on Management of Data (SIGMOD) is concerned with the principles, techniques and applications of database management systems and data management technology. Its members include software developers, academic and industrial researchers, practitioners, users, and students. SIGMOD sponsors the annual SIGMOD/PODS conference, one of the most important and selective in the field.”

2. CliniQG4QA addresses an important problem of domain adaptation of natural language question answering (QA) models. Medical professionals often query over clinical notes in Electronic Medical Records (EMRs) to find information that can support their decision-making. One way to facilitate such clinical information-seeking activities is to train a clinical question answering (CliniQA) system, which inputs a clinical document (e.g., a patient record) as well as a natural question raised by a clinician and extracts a text span from the document as the answer to the question. However, a QA model trained on one dataset often struggles to generalize to new questions and new contexts.  In the paper, the authors propose a simple yet effective framework, CliniQG4QA, which leverages automatic question generation to synthesize QA pairs on new clinical contexts and boosts QA models without requiring manual annotations. This study makes a great example of how to adapt general natural language processing (NLP) techniques to specialized domains such as clinical informatics. 

The IEEE International Conference on Bioinformatics and Biomedicine (BIBM) has established itself as the premier research conference in bioinformatics and biomedicine. IEEE BIBM 2021 provides a leading forum for disseminating the latest research in bioinformatics and health informatics. It brings together academic and industrial scientists from computer science, biology, chemistry, medicine, mathematics and statistics

“We are very excited to receive these awards and it is a great recognition of my students’ hard work and the team collaboration,” said Huan Sun. She has been dedicated to developing natural language interfaces and conversational AI systems with applications to various domains, which requires interdisciplinary effort. She said, “TURL is one of our attempts to bridge the gap between the natural language processing and database community. The techniques introduced in CliniQG4QA could be applied to bootstrap conversational AI interfaces for other domains beyond clinical question answering. ”


Faculty Profile