Starting from:

$30

Gradient Descent

 Let L: R ! R be a function, and let:


    (1) L  = minw2R L(w) be its minimum value
    (2) w = arg minw2R L(w) be the minimiser, i.e., the argument for which the minimum value is attained (note that L = L(w )).

The gradient descent algorithm to find w of a function L(w) is given as follows:

wt+1 = wt
dL(w)
jw=wt ;
(1)

dw




where

    (i) > 0 is a step-size parameter

    (ii) dLdw(w) jw=wt is the derivative of L(w) with respect to w evaluated at w = wt.


Let L(w) = a2 w2 + bw + c.
Q1) [20 Marks] Plot the function L(w) (in red) L = L(w ) (blue dot) and w (green star), for the following cases:

    (1) a = 1; b = 0; c = 0.

    (2) a = 0:1; b = 0; c = 0

    (3) a = 10; b = 0; c = 0

    (4) a = 1; b = 0; c = 10

    (5) a = 1; b = 1; c = 1

(6) a = 0:1; b =    1; c =    1

Calculate L and w with pen and paper. Choose appropriate axis range so that L and w can be displayed in a proper manner.

Q2) [30 Marks] Run the gradient descent algorithm for t = 1; : : : ; T iterations (try various values for

    • such as 10; 100; 1000). For each iteration display

(1) The complete function L(w) (for w in an appropriate range used in [Q1] ), and L(wt) in one plot

(2) wt    w in a different plot.

Run the gradient descent algorithm for various values of such that : (i) the iterates wt are on the same side of w and converge to w , (ii) the iterates wt oscillate on both sides of w and converge to w , (iii) the iterates wt oscillate on both sides of w , but diverge to 1. Please derive these cases with pen and paper first and then proceed to code.

Q3) [25 Marks] Create your own function with multiple local minima, and compare (i) gradient descent versus (ii) gradient descent with momentum.

Q4) [25 Marks] Gradient Descent in 2D: Let w 2 R2. Consider the functions f1(w) =
1

w(1)2
+

2




1
w(2)2, f2(w) =
10
w(1)2 +
1
w(2)2, f3(w) =
1
w(1)2 +
10
w(2)2, f4(w) =
1
w(1)2 +
1
w(2)2
+

2

2

2

2

2

2

2


5w(1)  3w(2)  2. For the functions fi; i = 1; : : : ; 4

a) Show the negative gradient directions and contour plots.


    b) Perform gradient descent to find the minima and show the trajectories of the gradient descent algorithm. Use different step-sizes to demonstrate (i) no oscillation, (ii) oscillation in one coordinate (iii) one-sided, oscillatory and divergent modes of convergence behavior.































































2

More products