Starting from:
$30

$24

Artificial Intelligence Solved

Recall from lecture1 that Sam is either t or un t




= f t, un tg



and has to decide whether to exercise or relax




= fexercise, relaxg



on the basis of the following (probability, reward)-matrices (p(s; a; s0); r(s; a; s0))

for row s, column s0 in table with corner a






exercise
t
un t


relax
t
un t


t
.99, 8
.01, 8


t
.7, 10
.3, 10


un t
.2, 0
.8, 0


un t
0, 5
1, 5
The -discounted value of (s; a) is














lim
qn(s; a)




n!1




where




q0(s; a) := p(s; a; t)r(s; a; t) + p(s; a; un t)r(s; a; un t)

Vn(s) := max(qn(s; exercise); qn(s; relax))

qn+1(s; a) := q0(s; a) + p(s; a; t)Vn( t) + p(s; a; un t)Vn(un t) :




In particular, = 0:9 leads to the following qn(s; a) for n = 0; 1; 2






exercise
relax










t
8, 16.955, 23.812
10, 17.65, 23.685
relax, relax, exercise
un t
0, 5.4, 10.017
5, 9.5, 13.55
relax, relax, relax











Your task is to write a program that given




a positive integer n, a -setting G (0 < G < 1), and a state s




returns the values




qn(s; exercise) and qn(s; relax)




for = G. You may use any of the following programming languages




Prolog, Java, Python




but be prepared to demonstrate your program on Tue, March 6 (noon-1, LG 12, O’Reilly) or Wed, March 7 (10-11, LB04; on your machine).







It may help to read Poole & Mackworth, 9.5 Decision Processes.

More products