Registration is closed
The TrustML Young Scientist Seminars (TrustML YSS) started from January 28, 2022.
The TrustML YSS is a video series that features young scientists giving talks and discoveries in relation with Trustworthy Machine Learning.
For more information please see the following site.
This network is funded by RIKEN-AIP's subsidy and JST, ACT-X Grant Number JPMJAX21AF, Japan.
【The 32nd Seminar】
Date and Time: September 16th 9:00 am - 11:00 am(JST)
Venue: Zoom webinar
Speaker: Chirag Gupta (Carnegie Mellon University)
Title: Provably calibrating ML classifiers without distributional assumptions
Most ML classifiers provide probability scores for the different classes. What do these scores mean? Probabilistic classifiers are said to be calibrated if the observed frequencies of labels match the claimed/reported probabilities. While calibration in the binary classification setting has been studied since the mid-1900s, there is less clarity on the right notion of calibration for multiclass classification. In this talk, I will present recent work where we investigate the relationship between commonly considered notions of multiclass calibration and the calibration algorithms used to achieve these notions. We will discuss our proposed notion of top-label calibration, and the general framework of multiclass-to-binary (M2B) calibration. We show that any M2B notion of calibration can be provably achieved, no matter how the data is distributed. I will present these calibration guarantees as well as experimental results on calibrating deep learning models. Our proposed algorithms beat existing algorithms in most situations. Code for this work is available at https://github.com/aigen/df-posthoc-calibration. Main paper: https://arxiv.org/abs/2107.08353 (ICLR 2022) Additional relevant papers: https://arxiv.org/abs/2105.04656 (ICML 2021), https://arxiv.org/abs/2006.10564 (Neurips 2020), https://arxiv.org/abs/2204.13087 (COLT 2022)
Speaker: Qiongkai Xu (University of Melbourne)
Title: Humanly Certify Superhuman Classifiers
Estimating the performance of a machine learning system is a longstanding challenge in artificial intelligence research. Today, this challenge is especially relevant given the emergence of systems which are showing increasing evidence in outperforming human beings. In some cases, this "superhuman" performance is readily demonstrated; for example by defeating top-tier human players in traditional two player games. On the other hand, it can be challenging to evaluate classification models that potentially surpass human performance. Indeed, human annotations are often treated as a ground truth, which implicitly assumes the superiority of the human over any models trained on human annotations. In reality, human annotators are subjective and can make mistakes. Evaluating the performance with respect to a genuine oracle is more objective and reliable, even when querying the oracle is expensive or sometimes impossible. In this paper, we first raise the challenge of evaluating the performance of both humans and models with respect to an oracle which is unobserved. We develop a theory for estimating the accuracy compared to the oracle, using only imperfect human annotations for reference. Our analysis provides a simple recipe for detecting and certifying superhuman performance in this setting, which we believe will assist in understanding the stage of current research on classification. We validate the convergence of the bounds and the assumptions of our theory on carefully designed toy experiments with known oracles. Moreover, we demonstrate the utility of our theory by meta-analyzing large-scale natural language processing tasks, for which an oracle does not exist, and show that under our mild assumptions a number of models from recent years have already achieved superhuman performance with high probability---suggesting that our new oracle-based performance evaluation metrics are overdue as an alternative to the widely used accuracy metrics that are naively based on imperfect human annotations.
All participants are required to agree with the AIP Seminar Series Code of Conduct.
Please see the URL below.
RIKEN AIP will expect adherence to this code throughout the event. We expect cooperation from all participants to help ensure a safe environment for everybody.
Public events of RIKEN Center for Advanced Intelligence Project (AIP)Join community