Beyond First-order Optimization Methods for Machine Learning – Part III

FA-02: Beyond First-order Optimization Methods for Machine Learning – Part III
Stream: Advances in mathematical optimization for machine learning
Room: Turing
Chair(s): Fred Roosta, Albert Berahas

Large-scale derivative-free optimization using subspace methods
Lindon Roberts, Coralia Cartis
In existing techniques for model-based derivative-free optimization, the computational cost of constructing local models and Lagrange polynomials can be high. As a result, these algorithms are not as suitable for large-scale problems as derivative-based methods. In this talk, I will discuss a model-based derivative-free algorithm based on exploration of random subspaces, its worst-case complexity bounds, and some numerical results.

Efficient Newton Methods for Robust Stochastic Nonconvex Optimization
Thomas O’Leary-Roseberry, Nick Alger, Omar Ghattas
In this talk we explore how low dimensional geometric information of Hessians arising in stochastic nonconvex optimization problems can be used to design matrix-free Newton methods that have similar per-iteration computational complexity to first order methods. These methods can reduce the dependence on hyperparameters and improve convergence. Further these methods can be adapted to guard against overfitting by using the Hessian to detect noise dominated geometric information of the stochastic energy landscape.

A stochastic second order-type extra-step scheme for nonsmooth nonconvex optimization
Andre Milzarek, Minghan Yang, Zaiwen Wen, Tong Zhang
We present a stochastic second order-type extra-step method for solving nonsmooth nonconvex optimization problems. We assume that gradient and Hessian information of the smooth part of the objective function can only be approximated. Our method combines stochastic higher order steps for an underlying optimality condition with stochastic proximal steps to ensure convergence in expectation. A variant of our approach using variance reduction is discussed and we show that it enjoys better convergence properties. Numerical results on logistic regression and sparse deep learning are provided.

Doubly Adaptive Scaled Algorithm for Machine Learning Using Second-Order Information
Martin Takac, Majid Jahani, Sergey Rusakov, Zheng Shi, Peter Richtarik, Michael Mahoney
This paper presents a novel adaptive optimization algorithm for large-scale machine learning problems. Equipped with a low-cost estimate of local curvature and Lipschitz smoothness, our method dynamically adapts the search direction and step-size. The search direction contains the gradient information well-scaled by a diagonal preconditioning matrix that captures the local curvature information. Our methodology does not require the tedious task of learning rate tuning, as the learning rate is updated automatically without adding an extra hyper-parameter.