A key challenge in natural language understanding is recognizing when two sentences have the same meaning. I'll discuss our work on this problem over the past few years, including the exploration of compositional functional architectures, learning criteria, and naturally-occurring sources of training data. The result is a single sentence embedding model that outperforms all systems from the 2012-2016 SemEval semantic textual similarity competitions without training on any of the annotated data from those tasks. As a by-product, we developed a large dataset of automatically-generated paraphrase pairs by using parallel text and neural machine translation. We've since used the dataset, which we call ParaNMT-50M, to impart a notion of meaning equivalence to controlled text generation tasks, including syntactically-controlled paraphrasing and textual style transfer.
Bio:
Kevin Gimpel is an assistant professor at the Toyota Technological Institute at Chicago (TTIC), a philanthropically endowed academic computer science institute on the campus of the University of Chicago. He was previously a research assistant professor at TTIC from 2012 to 2015 and he received his PhD from the Language Technologies Institute at Carnegie Mellon University in 2012. His research focuses on natural language processing and machine learning. Recent interests include paraphrase recognition, narrative modeling, commonsense knowledge representation, and structured prediction in the era of deep learning. His research has been supported by a Sandia National Laboratories Fellowship and gifts from Google and Bloomberg.