Doorkeeper

Talk event: Learning theory of loss functions

Tue, 15 Dec 2020 09:00 - 11:00
Online Link visible to participants
Register
Free admission
There is room for 199 more people

Description

This online event consists of talks by three Ph.D. students. The main focus is the theory of loss functions and their designs in discrete prediction tasks. The order of the talks may change. Each talk will be 30 minitues with Q&A sessions.
The event will be held at Dec 14, 05:00 PM - 07:00 PM (MST) = Dec 14, 07:00 PM - 09:00 PM (EST) = Dec 15, 09:00 AM - 11:00 AM (JST).

---------- [Talk 1] ----------
Speaker: Han Bao (UTokyo, https://hermite.jp/)

Title: Calibrated surrogate losses for adversarially robust classification

Abstract: Adversarially robust classification seeks a classifier that is insensitive to adversarial perturbations of test patterns. This problem is often formulated via a minimax objective, where the target loss is the worst-case value of the 0-1 loss subject to a bound on the size of perturbation. Recent work has proposed convex surrogates for the adversarial 0-1 loss, in an effort to make optimization more tractable. In this work, we consider the question of which surrogate losses are calibrated with respect to the adversarial 0-1 loss, meaning that minimization of the former implies minimization of the latter. We show that no convex surrogate loss is calibrated with respect to the adversarial 0-1 loss when restricted to the class of linear models. We further introduce a class of nonconvex losses and offer necessary and sufficient conditions for losses in this class to be calibrated.

References:
[BSS COLT20] http://proceedings.mlr.press/v125/bao20a.html

---------- [Talk 2] ----------
Speaker: Jessie Finocchiaro (CU Boulder, https://jfinocchiaro.github.io/)

Title: Property elicitation as a tool for consistent surrogate loss functions

Abstract: Given a prediction task, understanding when one can and cannot design a consistent convex surrogate loss, particularly a low-dimensional one, is an important and active area of machine learning research. While calibration has historically been used to reason about consistency, we propose indirect property elicitation as an alternative necessary condition for a surrogate loss to be consistent. Motivated by structured prediction and other domains where the prediction dimension of the surrogate is of central importance, we give a novel lower bound on the prediction dimension.

Our lower bound tightens existing results in the case of discrete predictions, namely the feasible subspace dimension, showing that previous calibration-based bounds can largely be recovered purely via property elicitation, and even embeddings [FFW NeurIPS19, FFW COLT20]. For continuous predictions, our lower bound gives new results for variance estimation as well as the estimation of entropy and norms of the conditional distribution.

References:
[FFW NeurIPS19] https://arxiv.org/abs/1907.07330
[FFW COLT20] http://proceedings.mlr.press/v125/finocchiaro20a.html

---------- [Talk 3] ----------
Speaker: Yutong Wang (UMich, https://web.eecs.umich.edu/~yutongw/)

Title: Multiclass support vector machines and ordered partitions

Abstract: Classification is a central problem in supervised learning, where the goal is to learn a decision function that accurately assigns labels to instances. The support vector machine (SVM) is a learning algorithm that is popular in practice and also has strong theoretical properties. However, most of the theory developed is for the binary classification setting, where there are only two possible labels to choose from. Our work is concerned with the multiclass setting where there are three or more possible labels for the decision function to choose from. Multiclass SVMs have been formulated in a variety of ways. A recent empirical study by Doǧan et al. compared nine such formulations and recommended the variant proposed by Weston and Watkins (WW). Despite the superior empirical performance of the WW multiclass SVM, its theoretical properties remain poorly understood. Towards bridging this gap, we establish a connection between the hinge loss used in the WW multiclass SVM with ordered partitions. We use this connection to justify the recent empirical findings.

References:
[WS NeurIPS20] https://arxiv.org/abs/2006.07346

About this community

RIKEN AIP Public

RIKEN AIP Public

Public events of RIKEN Center for Advanced Intelligence Project (AIP)

Join community