This is an online seminar. Registration is required.
【Continuous Optimization Team】
【Date】2022/October/26(Wed) 10:00-11:00(JST)
【Speaker】 Jan Harold Mercado Alcantara of Academia Sinica Taipei, Taiwan
Title: Hyperparameter Learning and Sparse Optimization:Theory and Algorithms
Abstract
Central in machine learning tasks is the minimization of a loss function quantifying the prediction error for using a chosen model class to fit a given dataset. To obtain a model with a suitable level of complexity and a desirable structure, a regularizer is added to the loss function, often resulting in a nonsmooth and nonconvex objective function. In this talk, I will focus on optimization problems that require sparse solutions, thereby selecting only a subset of features in a given dataset as significant predictors. Specifically, this talk addresses two important issues: (i) how to develop fast algorithms with theoretical guarantees for these nonconvex sparse optimization problems for a prespecified regularization parameter, and (ii) how to systematically tune the weight for regularization (that is, the regularization parameter), which is also known as the hyperparameter learning problem.
For the first issue, we consider two nonconvex formulations of sparse optimization problems. For problems demanding certain sparsity levels, we directly attack the nonconvex sparsity-constrained problem through an accelerated projected gradient algorithm that combines a novel extrapolation technique and a highly efficient smooth optimization method. On the other hand, for nonconvex relaxations of sparsity-constrained optimization, we propose a smoothing strategy and solve it efficiently by the conjugate gradient
algorithm. These algorithms not only are equipped with strong theoretical convergence guarantees but also significantly outperform the state of the art in practice.
Second, we consider a bilevel optimization reformulation of the hyperparameter learning problem. The bilevel reformulation consists of two levels of optimization problems: the upper level seeks to minimize the validation error, while the lower-level problem optimizes the regularized problem using a given hyperparameter. A unified smoothing strategy to solve the bilevel problem is presented, allowing flexibility in choosing the smoothing function and the regularizer. New necessary conditions under milder constraint qualifications are proposed, and our smoothing algorithm is guaranteed to converge to points satisfying
these conditions. Numerical comparisons demonstrate the applicability and advantages of the approach.