MAT 3007 – Optimization Exercise Sheet 8

Starting from:

$30

Home

Problem 1 (A One-Dimensional Problem):

(approx. 20 pts)
We consider the optimization problem

min f

x

1

x
2(x − 1)

x

.

,
. .

(

) := −(x − 1)2 log(

x + 1

s.t.

∈ [1

5

x

) −

4 5]

Implement the bisection or the golden section method to solve this problem and output a solution with accuracy at least 10−5.

Problem 2 (The Gradient Method):

(approx. 30 pts)
In this exercise, we want to solve the optimization problem

min
f
x
x4
+
2
x3
+

1
x2
− 2
x2x
2
+

4
x2.
(1)

x∈R2

( ):=
1

3 1

2 1

1

3 2

via the gradient descent method. (This is problem 1 discussed in sheet 6).

Implement the gradient method that was presented in the lecture as a function gradient_method in MATLAB or Python.

The following input functions and parameters should be considered:

• obj, grad – function handles that calculate and return the objective function f(x) and the gradient rf(x) at an input vector x ∈ Rn. You can treat these handles as functions or fields of a class or structure f or you can use f and rf directly in your code. (For example, your function can have the form gradient_method(obj,grad,...)).

• x0 – the initial point.

• tol – a tolerance parameter. The method should stop whenever the current iterate xk satisfies the criterion krf(xk)k ≤ tol.
We want to investigate the performance of the gradient method for diﬀerent step size strategies. In particular, we want to test and compare backtracking and exact line search. The following parameters will be relevant for these strategies:

• σ, γ ∈ (0, 1) – parameters for backtracking and the Armijo condition. (At iteration k, we choose αk as the largest element in {1, σ, σ2, . . . } satisfying the condition f(xk −αkrf(xk))− f(xk) ≤ −γαk · krf(xk)k2).

• You can use the golden section method to determine the exact step size αk. The parameters for the golden section method are: maxit (maximum number of iterations), tol (stopping tolerance), [0, a] (the interval of the step size).

You can organize the latter parameters in an appropriate options class or structure. It is also possible to implement separate algorithms for backtracking and exact line search. The method(s) should return the final iterate xk that satisfies the stopping criterion.

Page 1 of 4
a) Apply the gradient method with backtracking and parameters (σ, γ) = (0.5, 0.1) and exact line search (maxit = 100, tol = 10−6, a = 2) to solve the problem minx f(x).
The algorithms should use the stopping tolerance tol = 10−5. Test the methods using the initial point x0 = (3, 3)> and report the behavior and performance of the methods. In particular, compare the number of iterations and the point to which the diﬀerent gradient descent methods converged.

b) Let us define the set of initial points

( ! ! ! !)
X
0 :=
−3
,
3
,−3,
3

−
3

−
3
3
3

Run the methods:

– Gradient descent method with backtracking and (σ, γ) = (0.5, 0.1),

– Gradient method with exact line search and maxit = 100, tol = 10−6, a = 2,

again for every initial point in the set X 0 using the tolerance tol = 10−5. For each algorithm/step size strategy create a single figure that contains all of the solution paths generated for the diﬀerent initial points. The initial points and limit points should be clearly visible. Add a contour plot of the function f in the background of each figure.

Problem 3 (A Limited-Memory Version of Adagrad): (approx. 25 pts)

In this exercise, we investigate a limited-memory version of the gradient descent method with adaptive diagonal scaling. The update of the method is given by

xk+1 = xk − αkDk−1rf(xk),
(2)
where αk ≥ 0 is a suitable step size and Dk ∈ Rn×n is a diagonal matrix that is chosen as follows:
D = diag(vk, vk, ..., vk) and
k 1 2 n

v

vik =
uk
, tm(k) = max{0, k − m}, ∀ i = 1, ..., n.

u + j=tm(k)(rf(xj))i2

X

u

t

Here, > 0, the memory constant m ∈ N, and the initial point x0 ∈ Rn are given parameters.

a) Show that the direction dk = −Dk−1rf(xk) is a descent direction for all k ∈ N (assuming rf(xk) 6= 0).
b) Write a MATLAB or Python code and implement the gradient method with adaptive diagonal scaling (Adagrad) and backtracking (i.e., implement the abstract descent method with dk as descent direction). Run Adagrad on problem (1) introduced in the last exercise and compare its performance with the tested approaches in Problem 2 using the same initial points and stopping tolerance as in Problem 2 b). You can use the following parameters:

– (σ, γ) = (0.5, 0.1) – parameter for backtracking.

– = 10−6 – scaling parameter in the update (2).

– m = 25 – memory parameter.

Generate a plot of the diﬀerent solution paths of Adagrad using the diﬀerent initial points from X 0 and add a contour plot of f in the background.

Page 2 of 4
c) How does Adagrad behave when diﬀerent memories m are chosen? (Suitable other choices might include m ∈ {5, 10, 15, 25, ..., maxit})? Does the performance or convergence behavior change?

Problem 4 (Globalized Newton’s Method): (approx. 25 pts)

Implement the globalized Newton method with backtracking that was presented in the lecture as a function newton_glob in MATLAB or Python.

The pseudo-code for the full Newton method is given as follows:

Algorithm 1: The Globalized Newton Method

• Initialization: Select an initial point x0 ∈ Rn and parameter γ, γ1, γ2, σ ∈ (0, 1) and tol. for k = 0, 1, ... do

• If krf(xk)k ≤ tol, then STOP and xk is the output.

• Compute the Newton direction sk as solution of the linear system of equations:

r2f(xk)sk = −rf(xk).

• If −rf(xk)>sk ≥ γ1 min{1, kskkγ2 }kskk2, then accept the Newton direction and set dk = sk.
Otherwise set dk = −rf(xk).
• Choose a step size αk by backtracking and calculate xk+1 = xk + αkdk.

The following input functions and parameters should be considered:

• obj, grad, hess – function handles that calculate and return the objective function f(x), the gradient rf(x), and the Hessian r2f(x) at an input vector x ∈ Rn. You can treat these handles as functions or fields of a class or structure f or you can use f, rf, and r2f from part a) and b) directly in the algorithm. (For example, your function can have the form newton_glob(obj,grad,hess,...)).

• x0 – the initial point.

• tol – a tolerance parameter. The method should stop whenever the current iterate xk satisfies the criterion krf(xk)k ≤ tol.
• γ1, γ2 > 0 – parameters for the Newton condition.

• σ, γ ∈ (0, 1) – parameters for backtracking and the Armijo condition.

You can again organize the latter parameters in an appropriate options class or structure. You can use the backslash operator A\b in MATLAB or numpy.linalg.solve(A,b) to solve the linear system of equations Ax = b. If the computed Newton step sk = −r2f(xk)−1rf(xk) is a descent direction and satisfies

−rf(xk)>sk ≥ γ1 min{1, kskkγ2 }kskk2,

we accept it as next direction dk. Otherwise, the gradient direction dk = −rf(xk) is chosen. The method should return the final iterate xk that satisfies the stopping criterion.

a) Test your approach on the Rosenbrock function f : R2 → R

f(x) = 100(x2 − x21)2 + (1 − x1)2

with initial point x0 = (−1, −0.5)> and parameter (σ, γ) = (0.5, 10−4) and (γ1, γ2) = (10−6, 0.1). (Notice that γ is smaller here). Besides the globalized Newton method also run

Page 3 of 4
the gradient method with backtracking ((σ, γ) = (0.5, 10−4)) on this problem and compare the performance of the two approaches using the tolerance tol = 10−7.

Does the Newton method always utilize the Newton direction? Does the method always use full step sizes αk = 1?

b) Repeat the performance test described in Problem 2 b) and Problem 3 b) for problem

(1) using the globalized Newton method. You can use the parameter (σ, γ) = (0.5, 0.1), (γ1, γ2) = (10−6, 0.1), and tol = 10−5.

Plot all of the solution paths obtained by Newton’s method for the diﬀerent initial points in X 0 in one figure (with a contour plot of f in the background).

Information: MATLAB and Python Code

• For those questions that ask you to write code to solve the problem, please attach the code to the homework. Please also state the optimal solution and the optimal value or show the obtained plot / figure in your solution sheet (depending on the specific question).

Sheet 8 is due on May, 6th. Submit your solutions before May, 6th, 12:00 pm (noon).

Page 4 of 4