Doorkeeper

Imperfect Information Learning Team Seminar(Talk by Qizhou Wang, Hong Kong Baptist University).

Fri, 17 Jan 2025 16:00 - 17:00 JST
Online Link visible to participants
Register

Registration is closed

Get invited to future events

Free admission

Description

This is an online seminar. Registration is required.
【Team】Imperfect Information Learning Team
【Date】2025/January/17(Fri) 16:00-17:00(JST)
【Speaker】Qizhou Wang, Hong Kong Baptist University

Title: Towards Detect and Remove Undesirable Model Behaviors from Trained Models

Abstract:
Although AI and machine learning methods have achieved great success in practical applications, there are widespread concerns regarding undesirable behaviors observed in these systems. Such behaviors can lead to severe drawbacks, posing risks to the safe, reliable, and lawful deployment and usage of modern machine learning systems. In this presentation, we will explore techniques to prevent undesirable model behaviors, considering various contributing sources and discussing strategies to mitigate their effects. Specifically, for classification models, our discussion centers around out-of-distribution (OOD) detection, where we explore methods to identify undesirable behaviors that may arise from OOD inputs. We focus on enhancing the efficacy of OOD detection in resource-constrained situations as well as examine the real-world reliability of more advanced methods that based on outlier exposure. Moreover, for generative models, our discussion focuses on large language model (LLM) unlearning, which involves practical techniques for removing undesirable knowledge parameterized in LLMs. We primarily concentrate on assessing and quantifying the performance of unlearning methods, which aids in reliable comparisons across different methods and facilitates effective hyper-parameter tuning. We observed that efficacy in unlearning often leads to notable negative impacts on overall model performance. Furthermore, some of the baseline methods, such as gradient difference, have shown to outperform more advanced methods with proper hyper-parameter tuning. We also test various experimental setups on the performance of unlearning, finding that many basic strategies have a notable improvement of unlearning efficacy. Our observations and analysis deepen our understanding of LLM unlearning and may inspire further advanced research that enhances the efficacy of LLM unlearning.

About this community

RIKEN AIP Public

RIKEN AIP Public

Public events of RIKEN Center for Advanced Intelligence Project (AIP)

Join community