Talk event: Learning theory of loss functions

Tue, 15 Dec 2020 09:00 - 11:00 JST
Online Link visible to participants

Registration is closed

Get invited to future events

Free admission


This online event consists of talks by three Ph.D. students. The main focus is the theory of loss functions and their designs in discrete prediction tasks. The order of the talks may change. Each talk will be 30 minutes with Q&A sessions.
The event will be held at Dec 14, 05:00 PM - 07:00 PM (MST) = Dec 14, 07:00 PM - 09:00 PM (EST) = Dec 15, 09:00 AM - 11:00 AM (JST).

---------- [Talk 1] ----------
Speaker: Han Bao (UTokyo,

Title: Calibrated surrogate losses for adversarially robust classification

Abstract: Adversarially robust classification seeks a classifier that is insensitive to adversarial perturbations of test patterns. This problem is often formulated via a minimax objective, where the target loss is the worst-case value of the 0-1 loss subject to a bound on the size of perturbation. Recent work has proposed convex surrogates for the adversarial 0-1 loss, in an effort to make optimization more tractable. In this work, we consider the question of which surrogate losses are calibrated with respect to the adversarial 0-1 loss, meaning that minimization of the former implies minimization of the latter. We show that no convex surrogate loss is calibrated with respect to the adversarial 0-1 loss when restricted to the class of linear models. We further introduce a class of nonconvex losses and offer necessary and sufficient conditions for losses in this class to be calibrated.


---------- [Talk 2] ----------
Speaker: Jessie Finocchiaro (CU Boulder,

Title: Property elicitation as a tool for consistent surrogate loss functions

Abstract: We formalize and study the natural approach of designing convex surrogate loss functions via embeddings for problems such as classification or ranking. In this approach, one embeds each of the finitely many predictions (e.g. classes) as a point in R^d, assigns the original loss values to these points, and convexifies the loss in between to obtain a surrogate. We prove that this approach is equivalent, in a strong sense, to working with polyhedral (piecewise linear convex) losses. Moreover, it is important to ask about the prediction dimension d, and for which target losses this can be lowered (and how) without sacrificing consistency.

[FFW NeurIPS19]

---------- [Talk 3] ----------
Speaker: Yutong Wang (UMich,

Title: Multiclass support vector machines and ordered partitions

Abstract: Classification is a central problem in supervised learning, where the goal is to learn a decision function that accurately assigns labels to instances. The support vector machine (SVM) is a learning algorithm that is popular in practice and also has strong theoretical properties. However, most of the theory developed is for the binary classification setting, where there are only two possible labels to choose from. Our work is concerned with the multiclass setting where there are three or more possible labels for the decision function to choose from. Multiclass SVMs have been formulated in a variety of ways. A recent empirical study by Doǧan et al. compared nine such formulations and recommended the variant proposed by Weston and Watkins (WW). Despite the superior empirical performance of the WW multiclass SVM, its theoretical properties remain poorly understood. Towards bridging this gap, we establish a connection between the hinge loss used in the WW multiclass SVM with ordered partitions. We use this connection to justify the recent empirical findings.

[WS NeurIPS20]

About this community



Public events of RIKEN Center for Advanced Intelligence Project (AIP)

Join community