Speaker: Paul Pu Liang (Carnegie Mellon University)
Title: Computational Modeling of Human Multimodal Language
Abstract: Computational modeling of human multimodal language is an emerging research area in natural language processing spanning the language, visual and acoustic modalities. Comprehending multimodal language requires not only the modeling of interactions within each modality (intra-modal interactions) but more importantly the interactions between modalities (cross-modal interactions). Modeling these interactions lie at the core of multimodal language analysis. This talk will describe several recent advances in modeling multimodal language from a machine learning perspective. We will cover models that involve synchronized recurrent networks, tensor products, gating mechanisms, Bayesian ranking algorithms, hybrid generative-discriminative objectives, and robust representation learning via modality translations. From a resource perspective, there is also a genuine need for large-scale datasets that allow for in-depth studies of human multimodal language. We will introduce the CMU-Multimodal Opinion Sentiment and Emotion Intensity (MOSEI), the largest dataset for multimodal sentiment analysis and emotion recognition. The talk will conclude with several open research directions in human language modeling and multimodal machine learning.
Public events of RIKEN Center for Advanced Intelligence Project (AIP)Join community