Distinguished Guest Speaker: Raymond J. Mooney

Professor in the Department of Computer Science at the Univeristy of Texas at Austin
Friday, March 3, 2017, 3:00 pm

Generating Natural-Language Video Descriptions using LSTM Recurrent Neural Networks

We present a method for automatically generating English sentences describing short videos using deep neural networks. Specifically, we apply convolutional and Long Short-Term Memory (LSTM) recurrent networks to translate videos to English descriptions using an encoder/decoder framework.  A sequence of image frames (represented using deep visual features) is first mapped to a vector encoding the full video, and then this encoding is mapped to a sequence of words. We have also explored how statistical linguistic knowledge mined from large text corpora, specifically LSTM language models and lexical embeddings, can improve the descriptions. Experimental evaluation on a corpus of short YouTube videos and movie clips annotated by Descriptive Video Service demonstrate the capabilities of the technique by comparing its output to human-generated descriptions.

Bio: Raymond J. Mooney is a Professor in the Department of Computer Science at the University of Texas at Austin. He received his Ph.D. in 1988 from the University of Illinois at Urbana/Champaign. He is an author of over 160 published research papers, primarily in the areas of machine learning and natural language processing. He was the President of the International Machine Learning Society from 2008-2011, program co-chair for AAAI 2006, general chair for HLT-EMNLP 2005, and co-chair for ICML 1990. He is a Fellow of the American Association for Artificial Intelligence (AAAI), the Association for Computing Machinery (ACM), and the Association for Computational Linguistics (ACL) and the recipient of best paper awards from AAAI-96, KDD-04, ICML-05 and ACL-07.

Host: Wei Xu