Projected gradient descent

Projected gradient descent is a simple modification of gradient descent. Instead of setting the next iterate to

x(t+1)=x(t)ηf(x(t)),\begin{align*} x^{(t+1)} = x^{(t)} - \eta \nabla f(x^{(t)}), \end{align*} we set the next iterate to x(t+1)=P𝒮(x(t)ηf(x(t))).\begin{align*} x^{(t+1)} = P_{\mathcal{S}} (x^{(t)} - \eta \nabla f(x^{(t)})). \end{align*}

Projected gradient descent convergence bound

(PGD convergence bound)

If f,𝒮f, \mathcal{S} are convex and TR2G2ϵ2T \geq \frac{R^2 G^2}{\epsilon^2}, then f(𝐱̂)f(𝐱*)+ϵf(\mathbf{\hat x}) \leq f(\mathbf{x^*}) + \epsilon.

Analysis

PGD convergence bound


Compare PGD to GD

In GD:

  1. Pick an initial point 𝐱0n\mathbf{x}_0 \in \mathbb{R}^n
  2. Loop until stopping condition is met:
    1. descent direction: compute f(𝐱k)-\nabla f(\mathbf{x}_k)
    2. stepsize: pick a αk\alpha_k
    3. update: 𝐱k+1=𝐱kαf(𝐱k)\mathbf{x}_{k+1} = \mathbf{x}_{k}-\alpha \nabla f(\mathbf{x}_k)

In PGD:

  1. Pick an initial point 𝐱0𝒬\mathbf{x}_0 \in \mathcal{Q}, where 𝒬\mathcal{Q} is a set to constrain the solution
  2. Loop until stopping condition is met:
    1. descent direction: compute f(𝐱k)-\nabla f(\mathbf{x}_k)
    2. stepsize: pick a αk\alpha_k
    3. update: 𝐲k+1=𝐱kαf(𝐱k)\mathbf{y}_{k+1} = \mathbf{x}_{k}-\alpha \nabla f(\mathbf{x}_k)
    4. projection: 𝐱k+1=argmin𝐱𝒬12||𝐱𝐲k+1||22\mathbf{x}_{k+1} = {\arg\min}_{\mathbf{x} \in \mathcal{Q}} \frac{1}{2}||\mathbf{x}-\mathbf{y}_{k+1}||_2^2

References:

  1. https://angms.science/doc/CVX/CVX_PGD.pdf
  2. https://www.chrismusco.com/amlds2023/notes/lecture06.html#Projected_Gradient_Descent