Gradient descent convergence for α-strongly convex functions

Let ff be an α-strongly convex function and assume we have that, for all 𝐱\mathbf{x}, ||f(𝐱)||2G||\nabla f(\mathbf{x})||_2 \leq G. If we run GD for TT steps (with adaptive step sizes) we have: f(𝐱̂)f(𝐱*)2G2αTf(\hat{\mathbf{x}})-f(\mathbf{x}^*) \leq \frac{2G^2}{\alpha T}

Corollary

If T=O(G2αϵ)T=O(\frac{G^2}{\alpha \epsilon}) we have f(𝐱̂)f(𝐱*)ϵf(\hat{\mathbf{x}})-f(\mathbf{x}^*) \leq \epsilon


See: Gradient descent convergence bound

Also compare: Gradient descent convergence for β-smooth functions