This talk will be held in a hybrid format, both in person at AIP Open Space of RIKEN AIP (Nihonbashi office) and online by Zoom. AIP Open Space: *only available to AIP researchers.
DATE & TIME
March 15, 2024: 11:00 am - 12:00 am (JST)
TITLE
From Sparse Modeling to Sparse Communication
SPEAKER
Prof. André F. T. Martins
ABSTRACT
Neural networks and other machine learning models compute continuous representations, while humans communicate mostly through discrete symbols. Reconciling these two forms of communication is desirable for generating human-readable interpretations or learning discrete latent variable models, while maintaining end-to-end differentiability.
In the first part of the talk, I describe how sparse modeling techniques can be extended and adapted for facilitating sparse communication in neural models. The building block is a family of sparse transformations induced by Tsallis entropies called alpha-entmax, a drop-in replacement for softmax, which contains sparsemax as a particular case. Entmax transformations are differentiable and (unlike softmax) they can return sparse probability distributions, useful to build interpretable attention mechanisms. Variants of these sparse transformations have been applied with success to machine translation, natural language inference, visual question answering, Hopfield networks, reinforcement learning, and other tasks.
In the second part, I show how sparse transformations can also be used to design new loss functions, replacing the cross-entropy loss. To this end, I will introduce the family of Fenchel-Young losses, revealing connections between generalized entropy regularizers and separation margin. I illustrate with applications in natural language generation, morphology, and machine translation. If time permits, I will discuss connections to Bregman divergences and the Bayesian learning rule.
In the third part, I introduce mixed random variables, which are in-between the discrete and continuous worlds. I build rigorous theoretical foundations for these hybrids, via a new ''direct sum'' base measure defined on the face lattice of the probability simplex. From this measure, I introduce new entropy and Kullback-Leibler divergence functions that subsume the discrete and differential cases and have interpretations in terms of code optimality. This framework suggests two strategies for representing and sampling mixed random variables, an extrinsic (''sample-and-project'') and an intrinsic one (based on face stratification).
BIOGRAPHY
André F. T. Martins (PhD 2012, Carnegie Mellon University and Instituto Superior Técnico; https://andre-martins.github.io/) is an Associate Professor at Instituto Superior Técnico, University of Lisbon, researcher at Instituto de Telecomunicações, and the VP of AI Research at Unbabel. His research, funded by a ERC Starting Grant (DeepSPIN) and Consolidator Grant (DECOLLAGE), among other grants, include machine translation, quality estimation, structure and interpretability in deep learning systems for NLP. His work has received several paper awards at ACL conferences. He co-founded and co-organizes the Lisbon Machine Learning School (LxMLS), and he is a Fellow of the ELLIS society and co-director of the ELLIS Program in Natural Language Processing.
Public events of RIKEN Center for Advanced Intelligence Project (AIP)
Join community