Beyond First-order Optimization Methods for Machine Learning – Part II

TE-02: Beyond First-order Optimization Methods for Machine Learning – Part II
Stream: Advances in mathematical optimization for machine learning
Room: Turing
Chair(s): Fred Roosta, Albert Berahas

Full-low evaluation type methods for derivative-free optimization
Luis Nunes Vicente, Albert Berahas, Oumaima Sohab
We propose a new class of methods for DFO that considers two types of iterations. The first is expensive in fevals, but performs well in the smooth, non-noisy case. We considered BFGS computed over gradients approximated by finite differences. The second is cheap in fevals, more appropriate under noise or non-smoothness. We considered probabilistic direct search with 2 random directions. The resulting method is globally convergent in the non-smooth case and yields the appropriate rates in the smooth case. Numerical results showed that it is efficient and robust for all types of problems.

Newton-MR Algorithms With Complexity Guarantees for Non-convex Optimization
Fred Roosta
Classically, the conjugate gradient (CG) method has been the dominant solver in most inexact Newton-type methods for unconstrained optimization. In this talk, we consider replacing CG with the minimum residual method (MINRES), which is often used for symmetric but possibly indefinite linear systems. We show that MINRES has an inherent ability to detect negative-curvature directions. Equipped with this, we discuss line-search and trust-region variants of Newton-MR, which can be used for optimization of general non-convex objectives, and that come with favourable complexity guarantees.

Inexact Restoration with Subsampled Trust-region methods for finite-sum minimization
Stefania Bellavia, Natasa Krejic, Benedetta Morini
In this talk we will focus on subsampling second order trust-region procedures combined with the Inexact Restoration approach for finite-sum minimization. The sample size of function approximations is computed through a deterministic rule inspired by the inexact restoration method and the trust-region step is either accepted or rejected using a suitable merit function. We discuss the local and global properties for finding approximate first and second-order optimal points and show results from the numerical experience.

Retrospective Approximation for Stochastic Optimization
Raghu Bollapragada
We present Retrospective Approximation as a universal sequential sample-average approximation paradigm where during each iteration, a sample-path approximation problem is implicitly generated using an adapted sample size, and solved to an adapted error tolerance, using a “deterministic method” such as the line search quasi-Newton method. The principal advantage of RA is that decouples optimization from stochastic approximation, allowing the direct adoption of existing deterministic algorithms without modification, thus mitigating the need to redesign algorithms for the stochastic context.

Global optimization using random embeddings
Estelle Massart, Coralia Cartis, Adilet Otemissov
We present a general random subspace algorithmic framework for global optimization and analyse its convergence using tools from conic integral geometry and random matrix theory. We then particularise this framework and analysis for the class of functions with low effective dimension. We show that its convergence does not depend on the ambient dimension, and are able to estimate the effective dimension in the run of the algorithm. Encouraging numerical results are also presented that use local or global solvers in the subspace.

Theme: Overlay by Kaira