Gradient Descent

Starting from:

$30

Let L: R ! R be a function, and let:

(1) L = minw2R L(w) be its minimum value
(2) w = arg minw2R L(w) be the minimiser, i.e., the argument for which the minimum value is attained (note that L = L(w )).

The gradient descent algorithm to find w of a function L(w) is given as follows:

wt+1 = wt
dL(w)
jw=wt ;
(1)

dw

where

(i) > 0 is a step-size parameter

(ii) dLdw(w) jw=wt is the derivative of L(w) with respect to w evaluated at w = wt.

Let L(w) = a2 w2 + bw + c.
Q1) [20 Marks] Plot the function L(w) (in red) L = L(w ) (blue dot) and w (green star), for the following cases:

(1) a = 1; b = 0; c = 0.

(2) a = 0:1; b = 0; c = 0

(3) a = 10; b = 0; c = 0

(4) a = 1; b = 0; c = 10

(5) a = 1; b = 1; c = 1

(6) a = 0:1; b = 1; c = 1

Calculate L and w with pen and paper. Choose appropriate axis range so that L and w can be displayed in a proper manner.

Q2) [30 Marks] Run the gradient descent algorithm for t = 1; : : : ; T iterations (try various values for

• such as 10; 100; 1000). For each iteration display

(1) The complete function L(w) (for w in an appropriate range used in [Q1] ), and L(wt) in one plot

(2) wt w in a different plot.

Run the gradient descent algorithm for various values of such that : (i) the iterates wt are on the same side of w and converge to w , (ii) the iterates wt oscillate on both sides of w and converge to w , (iii) the iterates wt oscillate on both sides of w , but diverge to 1. Please derive these cases with pen and paper first and then proceed to code.

Q3) [25 Marks] Create your own function with multiple local minima, and compare (i) gradient descent versus (ii) gradient descent with momentum.

Q4) [25 Marks] Gradient Descent in 2D: Let w 2 R2. Consider the functions f1(w) =
1

w(1)2
+

2

1
w(2)2, f2(w) =
10
w(1)2 +
1
w(2)2, f3(w) =
1
w(1)2 +
10
w(2)2, f4(w) =
1
w(1)2 +
1
w(2)2
+

2

2

2

2

2

2

2

5w(1) 3w(2) 2. For the functions fi; i = 1; : : : ; 4

a) Show the negative gradient directions and contour plots.

b) Perform gradient descent to find the minima and show the trajectories of the gradient descent algorithm. Use different step-sizes to demonstrate (i) no oscillation, (ii) oscillation in one coordinate (iii) one-sided, oscillatory and divergent modes of convergence behavior.

2

More products

Problem Set V Solution

$30

Buy now

Problem Set IV Solution

$35

Buy now

Problem Set III Solution

$30

Buy now