Homework #3 Solution

Starting from:

~~$30~~

$24

Home

Note: You can use the results we have proved in class – no need to prove them again.

Q 1.
Exercise 4 in Chapter 7 of Recht-Wright (on Canvas).
[15pts]
Q 2.
Consider the unconstrained optimization problem minx2Rn f(x); where f is an L-smooth convex function. As-
sume that kx0 x k2 R; for some R 2 (0; 1), and let f (x) = f(x) +

kx
x0k22: Let x = argminx2Rn f(x)
2R2
and x = argminx2Rn f (x): You have already shown in previous homework (with possibly minor modifications)
that:
n) : f(x) f(x )

(x ) +

( x

f
(x) f
:

2

8 2 R

Prove that Nesterov’s method for smooth and strongly convex minimization applied to f will find a solution xk q

with f(xk) f(x ) in O( L R log( LR2 )) iterations. [5pts]

Using the lower bound for smooth minimization we have proved in class, prove the following lower bound for L-smooth and m-strongly convex optimization: any method satisfying the same assumption as we used in class
(that xk
x0 + Linfrf(x0); : : : ; rf(xk 1)g) must take at least k = (
q
L
) iterations in the worst case to
m
construct2a point xk such that f(xk) f(x ) ; for any 0:

[10pts]

Coding Assignment

Note: Your code needs to compile without any errors to receive any points.

If you are using Python, please follow these rules:

Please use Python 3.7+.

You may only use modules from the Python standard library plus NumPy and Matplotlib. Using other third party modules may result in points being deducted.

You may submit your code in Jupyter notebooks or in .py files. Please archive your files containing your code into one zip/rar/tar file for submission.

Please include a README file describing how to run your code - if we cannot figure out how to run your code within a reasonable time, you will receive zero points for the entire question.

For the coding assignment, in addition to the methods you have implemented in the last homework, you should also implement the following methods:

The method of conjugate gradients, for the version we have covered in class (also in Nesterov’s book, in Section 1.3.2). You should use the Dai-Yuan rule for k (the same as the one we have derived in class).

The Heavy-Ball method, which applies to L-smooth an m-strongly convex functions, and whose updates are defined by:

xk+1 = xk

1rf(xk) + 2(xk xk 1);

and 2 =
p

p

2

L
where 1 =

4

m
(p

+p

)2
p

+
p

:
L
L
m
m

Nesterov’s method for smooth and strongly convex optimization (any variant you like – the one we saw in class, or the one from Chapter 4 in Recht-Wright).

1

Q 3. The problem instance we will consider first is minx2Rn f(x); where n = 100 and f(x) =
1
xT Mx bT x +
2

m
kxk22 is characterized by:

2
2

21
01

03

203

1
0
: : :
0

6
2
01
21
0
: : :
0
0
7;

1
7

M =
0
21
: : :
0
0
b =
60
:

6

7

6 7

6
0

: : :
0
0
7

6
0
7

.
.
.
. .
. .
.
.

.

6
..
..
..
..

..
..
7

6
..
7

6

7

6 7

6
0
0
0
0
: : :
1 2
7

60
7

6

7

6 7

4

5

4 5

M and b can be generated in Matlab using:

k = n;

M = diag(2*[ones(k, 1); zeros(n-k, 1)], 0)...

diag([-ones(k-1, 1); zeros(n-k, 1)], -1)...

diag([-ones(k-1, 1); zeros(n-k, 1)], 1);

b = zeros(n, 1);

b(1) = b(1) + 1;

Write the code that implements Nesterov’s method for smooth minimization (feel free to reuse the code from last homework), Nesterov’s method for smooth and strongly convex minimization, the method of conjugate gradients, and the heavy ball method. Your code should produce three plots, corresponding to three different values of m: (1) m = 1;

m = 0:1; and (3) m = 0:01: Each plot should show the optimality gap (on a logarithmic scale) against the iteration count. Each run should be for 1000 iterations. The initial point for all the methods should be x0 = 0: Discuss your results. Do you observe what you expect from what we saw in class? How does the heavy ball compare to other

methods? How about the two variants of Nesterov’s method? [20pts] Now, modify Nesterov’s method for smooth minimization so that the function value over the iterates is monoton-ically decreasing (we have discussed in class how to do that). Ensure that your modified method in each iteration decreases the function value by at least as much as the standard gradient descent (with step size 1=L; explain how to achieve this). In how many iterations can your new method produce a point xk with f(xk) f(x ) ? (You should

provide a theoretical bound.) Produce the same set of plots. Observe what has changed and explain why. [20pts]

Q 4. In this part, you will compare the heavy ball method to Nesterov’s method for smooth and strongly convex optimization. Your problem instance is the following one-dimensional instance: minx2R f(x); where

f(x) = 821 x2
+ 24x 12;

25
x2
;
2

25

2

<

x

24x + 36;
2

:

if x < 1

if 1 x < 2

if x 2:
Prove that f is m-strongly convex and L-smooth with m = 1 and L = 25: What is the global minimizer of f? (Justify your answer.)

Run Nesterov’s method and the heavy-ball method, starting from x0 = 3:3: Plot the optimality gap of Nesterov’s method and the heavy ball method over 100 iterations. What do you observe? What does this plot tell you? [30pts]

Extra Credit Question

Q 5. Suppose that I give you an algorithm (let’s call it AGD-G) that given an initial point x0 2 Rn and gradient access to an L-smooth function f : Rn ! R (where 0 < L < 1) after k iterations returns a point xk 2 Rn that satisfies:

krf(xk)k2

s

2L(f((k0+ 1)2

:

x ) f(x
))

Note that AGD-G does not need to know the value of L.

Show that you can use AGD-G to obtain an algorithm that for any m-strongly convex and L-smooth function and

any 0 can construct a point xk with f(xk) f(x )in k = O
q

L

log
Lkx0

x k22

iterations. Your

m

algorithm should work without the knowledge of the values of L and m:

[15pts]

2