【Team】Approximate Bayesian Inference Team
【Date】2025/June/3(Tuesday) 15:00-16:00(JST)
【Speaker】Talk by Niyati Rawal, University of Modena and Reggio Emilia
Title: Integration of Vision and Language for Physical and Cognitive Human-Robot Interaction
Abstract:
This talk lies at the intersection of computer vision, natural language processing, and robotics, and delves into the integration of these domains within Human-Robot Interaction (HRI). It introduces innovative approaches for combining vision and language in two key areas of HRI: physical and cognitive interactions. In the realm of physical HRI, the focus is on Vision and Language Navigation(VLN), where an intelligent agent perceives its surroundings and follows human instructions such as “Go to the kitchen and clean the coffee table.” Methodologies are proposed to enhance navigation performance, including augmenting existing datasets and enabling agents to engage in dialogue when uncertain. For cognitive HRI, the emphasis shifts to multimodal empathetic dialogue generation. Here, agents are designed to interpret human expressions and language, responding empathetically to foster
meaningful interactions. This includes leveraging Transformer-based architectures to deliver empathetic responses, with one model employing reinforcement learning to generate replies that positively impact human emotions. In both domains, Transformer-based models play a pivotal role, pushing the boundaries of HRI. The research envisions a future where robots combine these skills, seamlessly transitioning between physical tasks like household cleaning and social roles as empathetic companions. Such advancements pave the way for the development of truly intelligent, versatile robots capable of addressing complex human needs.
Public events of RIKEN Center for Advanced Intelligence Project (AIP)
Join community