Gradient descent
General gradient descent
algorithm
Iteratively update the parameters
by making small adjustment that decreases
.
In particular, update
where
is step size.
Gradient descent method
Iteratively update the current estimate in the direction opposite the
gradient direction
The solution depends on the initial condition. Reaches the local minimum
closest to the initial condition if the stepsize is chosen properly.
Yield global optimal if J is convex, regardless initial solution
Gradient descent analysis
Assume:
-
is convex
- Lipschitz function: for all
,
- starting radius:
Gradient descent:
- choose number of steps
- starting point
,
e.g.
- for
:
- return
Gradient
descent convergence bound
Gradient descent
for constrained optimization
- Let
- Let
be any point in
.
- Repeat for
to
-
Projection of
on
- At the end output
.
is the diameter for
,
or if unconstrained, then simply an upper bound on
.
is an upper bund on the size of
‘s
gradient, i.e.
.
See: Stochastic gradient
descent, Gradient
References:
- https://www.chrismusco.com/amlds2023/notes/lecture06.html#Projected_Gradient_Descent
- https://www.cs.princeton.edu/courses/archive/fall18/cos521/Lectures/lec16.pdf
- https://en.wikipedia.org/wiki/Gradient_descent
- https://www.stat.cmu.edu/~ryantibs/convexopt-F13/scribes/lec6.pdf