A unified modular analysis of online and stochastic optimization: adaptivity, optimism, non-convexity
File(s)OPT2016_paper_38.pdf (323.1 KB)
Published version
Author(s)
Joulani, P
Gyorgy, A
Szepesvari, C
Type
Conference Paper
Abstract
We present a simple unified analysis of adaptive Mirror Descent (MD) and Follow-
the-Regularized-Leader (FTRL) algorithms for online and stochastic optimization
in (possibly infinite-dimensional) Hilbert spaces. The analysis is modular in
the sense that it completely decouples the effect of possible assumptions on the
loss functions (such as smoothness, strong convexity, and non-convexity) and
on the optimization regularizers (such as strong convexity, non-smooth penalties
in composite-objective learning, and non-monotone step-size sequences). We
demonstrate the power of this decoupling by obtaining generalized algorithms and
improved regret bounds for the so-called “adaptive optimistic online learning” set-
ting. In addition, we simplify and extend a large body of previous work, including
several various AdaGrad formulations, composite-objective and implicit-update
algorithms. In all cases, the results follow as simple corollaries within few lines
of algebra. Finally, the decomposition enables us to obtain preliminary global
guarantees for limited classes of non-convex problems.
the-Regularized-Leader (FTRL) algorithms for online and stochastic optimization
in (possibly infinite-dimensional) Hilbert spaces. The analysis is modular in
the sense that it completely decouples the effect of possible assumptions on the
loss functions (such as smoothness, strong convexity, and non-convexity) and
on the optimization regularizers (such as strong convexity, non-smooth penalties
in composite-objective learning, and non-monotone step-size sequences). We
demonstrate the power of this decoupling by obtaining generalized algorithms and
improved regret bounds for the so-called “adaptive optimistic online learning” set-
ting. In addition, we simplify and extend a large body of previous work, including
several various AdaGrad formulations, composite-objective and implicit-update
algorithms. In all cases, the results follow as simple corollaries within few lines
of algebra. Finally, the decomposition enables us to obtain preliminary global
guarantees for limited classes of non-convex problems.
Date Issued
2016-12-10
Date Acceptance
2016-10-04
Citation
9th NIPS Workshop on Optimization for Machine Learning, 2016
Journal / Book Title
9th NIPS Workshop on Optimization for Machine Learning
Copyright Statement
© 2016 The Authors
Source
9th NIPS Workshop on Optimization for Machine Learning
Publication Status
Published
Start Date
2016-12-10
Coverage Spatial
Barcelona, Spain
OA Location
http://opt-ml.org/papers/OPT2016_paper_38.pdf