Doorkeeper

Recent results on learning with diffusion models

Mon, 22 Apr 2024 13:00 - 14:00 JST
Online Link visible to participants
Register

Registration is closed

Get invited to future events

Free admission

Description

Abstract: Diffusion models have been successfully applied to text-to-image generation with state-of-the-art performance. In this talk, I will discuss how these models can be used for low-level vision tasks and 3D scenes. First, I will present our findings on exploiting features from diffusion models and transformers for zero-shot semantic correspondence and other applications. Next, I will describe how we exploit diffusion models as effective prior for dense prediction, such as surface normal, depth, and segmentation. I will then discuss how diffusion models can facilitate articulated 3D reconstruction, 3D scene generation, and novel view synthesis. When time allows, I will present other results on fine-grained text-to-image generation and pixel-wise visual grounding of large multimodal models.

Bio: Ming-Hsuan Yang is a Professor at UC Merced and a Research Scientist with Google. He received the Google Faculty Award in 2009 and CAREER Award from the National Science Foundation in 2012. Yang received paper awards at UIST 2017, CVPR 2018, ACCV 2018, and Longuet-Higgins Prize in CVPR 2023. He is an Associate Editor-in-Chief of PAMI, Editor-in-Chief of CVIU, and Associate Editor of IJCV. He was the Program Chair for ACCV 2014 and ICCV 2019 and Senior Area Chair/Area Chair for CVPR, ICCV, ECCV, NeurIPS, ICLR, ICML, IJCAI, and AAAI. Yang is a Fellow of the IEEE and ACM.

About this community

RIKEN AIP Public

RIKEN AIP Public

Public events of RIKEN Center for Advanced Intelligence Project (AIP)

Join community