Engineering AI research aims to boost computers’ understanding of human language

Posted: August 10, 2018

What’s the story? Give me the scoop. Spill the beans. The human language allows for an infinite number of ways to describe the same idea or phrase, but these variations can be difficult for a computer to understand. An engineering researcher at The Ohio State University is earning national recognition for her work to solve this linguistic dilemma through natural language processing.

[Wei Xu] Wei Xu“Natural language processing is a big branch of artificial intelligence (AI). It’s primarily applied in machine learning algorithms to help computers either understand human language or help generate it,” explained Wei Xu, an assistant professor in the Department of Computer Science and Engineering.

Her project—called LanguageNet—aims to tackle the understanding component by building a database of expressions or paraphrases. “We’re focusing not only on words and their synonyms, like CEO versus chief executive officer, but also code phrases and more complicated sentences,” she said. “Especially talking on social media, where language has become super creative. It’s very hard to allocate every single twist—there are so many variations.”

There has been steady progress toward large paraphrase resources, Xu said, and a significant increase in its applications, from information retrieval and extraction, and natural language generation, to IBM's Watson and Google's Knowledge Graph.

Xu’s research aims to create better paraphrase acquisition techniques and larger scale semantic resources, which could be of great use in various natural language processing tasks and social media data analytics in fields such as social sciences or national security. One potential application is text simplification, which automatically rephrases complex texts into simpler language for children or people with reading disabilities.

Earlier this year Xu’s LanguageNet team won the Q4 AI for Everyone Challenge from Figure Eight (formerly CrowdFlower), which aims to give machine learning experts the tools and resources to create AI projects that contribute to the greater good. Since deep learning algorithms are data hungry, Xu’s team will use the company’s resources to annotate more sentence and phrase data in 10 different languages to power their AI project.

“Figure Eight will be doing crowdsourcing and asking internet users to do annotation tasks,” said Xu. “That human knowledge data is what’s important for training automation learning algorithms. It’s necessary for capturing hundreds of thousands of phrases.”

Xu’s innovative research continues to earn recognition. She recently received a $175K CRII award from the National Science Foundation (NSF), which will support building the machine learning algorithms to make use of the human knowledge data she acquires. Additionally, Xu has earned a 2018 Best Paper Award from COLING, one of the top conferences in the field.

by Meggie Biss, College of Engineering Communications | biss.11@osu.edu