$29
Goal: minimizing a multivariate function f(x) using gradient-based method with backtracking.
(a) Use backtracking as described in class to compute step-lengths (so you need to set the parameters s, and ). Condition: > 0, ∈ (0,1), ∈ (0,1)
1. Start with initial guess for step length (") = > 0
2. If ( ") − ( " + " ") ≥ − " ∇f( ") "
▪ (") decrease function sufficiently ⇒ current (") is chosen
3. Otherwise, repeat reduce (") by multiplying ∈ (0,1) until condition in step 2 is met
▪ (") = (")
Observation (take $( ) as an example)
$( ) = $% + %% + &% ℎ (') = (1,1,1)(
• Let = 0.25 and = 2. The larger the , the larger the final step length (") will be chosen. Larger reduce smaller amount of ("), higher chance the larger (") will satisfy the condition in step 2.
• Let = 0.6 and = 2. The smaller the , the larger the final step length (") will be chosen. Smaller require larger (") to decrease the function sufficiently.
• Try different combinations of & .
Slow convergence when α()) is small, might miss the value satisfy stop condition when α()) = 2, = 0.25, = 0.5 in part d for consistence, we could try to adjust the value of function does not meet the termination criterion before hits max number of iterations.
is large. Set & if the
( , )
( . , . )
( . , 0.5)
( . , 0.1)
step length
( )
Iteration
( )
Gradient at ( )
0.8609
40
[0.
00000218
0.00000218
[0.00000436 0.00000436
0.5
1
0.00000218]
0.00000436]
[0. 0. 0.]
[0. 0. 0.]
0.0200
313
[0.00000282
0.00000282
[0.00000565 0.00000565
0.00000282]
0.00000565]
(b) Use as a stopping condition $||0|1(2)∇.(/)||| ≤ ℎ = 1034 or stop if the number of iterations hits 1000.
The gradient of the function might not converge to the optimal solution but close to it, set stopping condition to allow some tolerance. The gradient-based method might take a long time to converge, set the number of max iterations to 1000 to avoid computational complexity.
(c) Print the initial point and for each iteration print the search direction, the step length, and the new iterate x(k+1): If the number of iterations is more than 15 then printout the details of the just the first 10 iterations as well as the details of the last 5 iterations before the stopping condition is met. Indicate if the iteration maximum is reached.
The implementation of gradient-based method with backtracking:
1. Set initial point '
2. Find the next solution "0$ = " + " " = " − "∇f(x)) with backtracking step size " and gradient direction " = −∇f(x))
3. Repeat step 2 until the termination criteria are met
(d) Test your algorithms on the following test problems (Set = 2, = 0.25, = 0.5 for consistence)
1. $( ) = $% + %% + &% ℎ (') = (1,1,1)(
2. %( ) = $% + 2 %% − 2 $ % − 2 % ℎ (') = (0,0)(
3. &( ) = 100( % − $%)% + (1 − $)% ℎ (') = (−1.2, 1)(
4. 5( ) = ( $ + %)5 + %% ℎ (') = (2, −3)(
5. 4( ) = ( $ − 1)% + ( % − 1)% + ( $% + %% − 0.25)% ℎ (') = (1, −1)(
c =1
c = 10
c = 100
Comment on how larger c affects the performance of the algorithm.
Iteration
( )
Gradient at
( )
11
16
[0.56408574 0.56408685]
[-0.00000762, -0.00000367]
[0.40261231 0.40260761]
[ 0.00000987, -0.00001345]
226
[0.35979117 0.35978795]
[ 0.00001533, -0.00000256]
The larger the c, more iterations are required to converge to optimal solution most case. Use the norm of gradients as metric of performance. The larger the norm of gradient, the final gradient is more far away from 0. The plot in the right side illustrates the value of c won’t affect the performance too much.
(e) Are your computational results consistent with the theory of the gradient-based methods?
Under the assumption that ( ) is bounded below and the gradient of ( ) is Lipchitz continuous over 6. The computational results are consistent with the theory of the gradient-based methods with backtracking. For functions in part d, show
∇fP (")Q → 0 → ∞
1. $( ) = $% + %% + &% ℎ (') = (1,1,1)( = 1:
∇f$P (")Q = 0 , = ∗ = [0,0,0]
$( ∗) = 0 ⇒ ∗ is global minimum since $( ) ≥ 0
2. %( ) = $% + 2 %% − 2 $ % − 2 % ℎ (') = (0,0)( = 33:
∇f%P (")Q = [−0.00001526, 0] , = [0.99998474, 0.99999237]
→ ∞ (if there’s no stopping criteria)
→
∗
= [1,1]
,
∇f
%
(")
Q → 0
,P
%
2
−2
\
( )
( ) = Z−2
4
Hessian PD
⇒ f
is convex
◦ ∗ is global minimum
3. &( ) = 100( % − $%)% + (1 − $)% ℎ (') = (−1.2, 1)( = 421:
∇f&P (")Q = [0.0000036, 0.00000819], = [1.00000999, 1.00002003] → ∞ (if there’s no stopping criteria)
→ ∗ = [1,1], ∇f%P (")Q → 0
&( ∗) = 0 ⇒ ∗ is global minimum since &( ) ≥ 0
4. 5( ) = ( $ + %)5 + %% ℎ (') = (2, −3)( = 617:
∇f5P (")Q = [ 0.00000997, −0.00000019], = [ 0.01356503, −0.00000508]
→ ∞ (if there’s no stopping criteria)
→
∗
,
(")
Q → 0
= [0,0] ∇f5P
5( ) ≥ 0
5( ∗) = 0
⇒ ∗
is global minimum since
5. 4( ) = ( $ − 1)% + ( % − 1)% + ( $% + %% − 0.25)% ℎ (') = (1, −1)(
Iteration
[0.
( )
Gradient at
( )
11
0.56408685]
56408574
[-0.00000762, -0.00000367]
16
[0.40261231
0.40260761]
[ 0.00000987, -0.00001345]
226
[0.35979117
0.35978795]
[ 0.00001533, -0.00000256]
→ ∞ (if there’s no stopping criteria)
∇f4P (")Q → 0