Advances in mathematical optimization for machine learning and data analysis – Part II

WD-02: Advances in mathematical optimization for machine learning and data analysis – Part II
Stream: Advances in mathematical optimization for machine learning
Room: Turing
Chair(s): Dimitri Papadimitriou

A Bregman Method for Structure Learning on Sparse Directed Acyclic Graphs
Manon Romain
We develop a Bregman proximal gradient method for structure learning on linear structural causal models. While the problem is non-convex, has high curvature and is NP-hard, Bregman gradient methods allow us to neutralize at least part of the impact of curvature by measuring smoothness against a highly nonlinear kernel. This allows the method to make longer steps and significantly improves convergence. Each iteration requires solving a Bregman proximal step which is convex and efficiently solvable for our particular choice of kernel. We test our method on various synthetic and real data.

An adaptive, accelerated algorithm for stochastic constrained convex optimization
Ali Kavis, Kfir Levy, Francis Bach, Volkan Cevher
We propose a novel adaptive, accelerated algorithm for stochastic constrained convex optimization. Our method, inspired by the Mirror-Prox method, simultaneously achieves the optimal rates for smooth/non-smooth problems with either deterministic/stochastic first-order oracles. This is done without any prior knowledge of the smoothness nor the noise properties of the problem. To the best of our knowledge, this is the first adaptive, unified algorithm that achieves optimal rates in the constrained setting. Through extensive numerical experiments, we demonstrate the performance of our framework.

Recent Advanced in the Theory of (Adaptive) Stochastic Gradient Methods
Vivak Patel
Stochastic Gradient Descent (SGD) and its adaptive variants are foundational algorithms in data science, where it is essential to understand the behavior of these methods. In this talk, we establish several novel results. For standard SGD, we demonstrate L2 and strong convergence for a broad class of nonconvex functions, subsuming most recent results. Moreover, we provide a unified consistency and asymptotic normality analysis for a broad class of convex functions and adaptive methods, improving on many recent efforts.