Gradient descent convergence bound

For GG-Lipschitz function ff where RR is the starting radius, xx(0)2R\lVert x^∗ − x^{(0)}\rVert_2 ≤ R

If we run GD for TR2G2ϵ2T \geq \frac{R^2 G^2}{\epsilon^2} iterations then f(𝐱̂)f(𝐱*)+ϵf(\mathbf{\hat{x}}) \leq f(\mathbf{x^*}) + \epsilon


If we run GD for TR2G2ϵ2T \geq \frac{R^2 G^2}{\epsilon^2} iterations with step-size η=RGT\eta = \frac{R}{G \sqrt{T}} then f(𝐱̂)f(𝐱*)+ϵf(\mathbf{\hat{x}}) \leq f(\mathbf{x^*}) + \epsilon


(Projected) Gradient descent returns 𝐱̂\hat{\mathbf{x}} with f(𝐱̂)min𝐱𝒮f(𝐱)+ϵf(\hat{\mathbf{x}}) \leq \min_{\mathbf{x} \in \mathcal{S}} f(\mathbf{x}) + \epsilon after T=R2G2ϵ2T = \frac{R^2 G^2}{\epsilon^2} iterations.


Proof

Claim 1: For all i=0,,Ti=0,…,T, f(𝐱(i))f(𝐱*)||𝐱(i)𝐱*||22||𝐱(i+1)𝐱*||222η+ηG22f(\mathbf{x}^{(i)})-f(\mathbf{x}^*) \leq \frac{||\mathbf{x}^{(i)}-\mathbf{x}^*||_2^2-||\mathbf{x}^{(i+1)}-\mathbf{x}^*||_2^2}{2\eta} + \frac{\eta G^2}{2} Claim 1(a): For all i=0,,Ti=0,…,T, f(𝐱(i))𝖳(𝐱(i)𝐱*)||𝐱(i)𝐱*||22||𝐱(i+1)𝐱*||222η+ηG22\nabla f (\mathbf{x}^{(i)})^{\mathsf{T}}(\mathbf{x}^{(i)}-\mathbf{x}^{*}) \leq \frac{||\mathbf{x}^{(i)}-\mathbf{x}^*||_2^2-||\mathbf{x}^{(i+1)}-\mathbf{x}^*||_2^2}{2\eta} + \frac{\eta G^2}{2} Claim 1 follows from Claim 1(a) by definition of convexity.

#incomplete


See Gradient descent


References:

  1. https://www.stat.cmu.edu/~ryantibs/convexopt-F13/scribes/lec6.pdf