Gradient descent convergence bound
For
-Lipschitz function
where
is the starting radius,
If we run GD for
iterations then
If we run GD for
iterations with step-size
then
(Projected) Gradient descent returns
with
after
iterations.
Proof
Claim 1: For all
,
Claim 1(a): For all
,
Claim 1 follows from Claim 1(a) by definition of convexity.
#incomplete
See Gradient descent
References:
- https://www.stat.cmu.edu/~ryantibs/convexopt-F13/scribes/lec6.pdf