Advances in mathematical optimization for machine learning and data analysis – Part I

WC-02: Advances in mathematical optimization for machine learning and data analysis – Part I
Stream: Advances in mathematical optimization for machine learning
Room: Turing
Chair(s): Le Thi Khanh Hien

An adaptive subsampled Hessian-free optimization method for statistical learning.
Jeremy RIEUSSEC, Fabian Bastin, Jean Laprés-Chartrand, Loïc Shi-Garrier
We consider nonconvex statistical learning problems and propose a variable sample-path method, where the sample size is dynamically updated to ensure a decrease in the true objective function with high probability. We integrate this strategy in a subsampled Hessian-free trust-region method with truncated conjugate gradient, relying on outer product approximations. The approach is compared to various adaptive sample approximation algorithms and stochastic approximation methods popular in machine learning. The efficiency of the approach is illustrated on various large size datasets.

An Inertial Newton Algorithm for Deep Learning
Camille Castera, Jerome Bolte, Cédric Févotte, Edouard Pauwels
We introduce a new second-order inertial optimization method for machine learning. It exploits the geometry of the loss function while only requiring stochastic approximations of the function values and the generalized gradients. The algorithm combines both gradient-descent and Newton-like behaviors as well as inertia. We prove the convergence of the algorithm for most deep learning problems using tame optimization and dynamical systems theory. Additionally we discuss and address the existence of spurious stationary points when using mini-batch methods for non-smooth problems.

An Inertial Block Majorization Minimization Framework for Nonsmooth Nonconvex Optimization
Le Thi Khanh Hien, Duy Nhat Phan, Nicolas Gillis
We introduce TITAN, an inertial block majorization minimization framework for nonsmooth nonconvex optimizatioN problems. TITAN is a block coordinate method that embeds inertial force to each majorization-minimization step of the block updates. The inertial force is obtained via an extrapolation operator that subsumes heavy-ball and Nesterov-type accelerations for block proximal gradient methods as special cases. We study sub-sequential convergence as well as global convergence for the generated sequence of TITAN. We illustrate the effectiveness of TITAN on the matrix completion problem.