$29
Lab Objective:
In this lab, you will learn temporal difference learning (TD) algorithm by solving the 2048 game using an -tuple network.
Turn in:
1. Experiment report (.pdf)
2. Source code [NOT including model weights]
Notice: zip all files with name “DLP_LAB2_StudentId_Name.zip”,
e.g.: 「DLP_LAB2_0856738_鄭紹雄.zip」
Lab Description:
• Understand the concept of (before-)state and after-state.
• Learn to construct and design an -tuple network.
• Understand TD algorithm.
• Understand Q-learning network training.
Requirements:
• Implement TD(0) algorithm
◦ Construct an -tuple network
◦ Action selection according to the -tuple network
◦ Calculate TD-target and TD-error
◦ Update V(state), not V(after-state).
◦ Understand temporal difference learning mechanisms
1
Deep Learning and Practice 2021 Spring; NYCU CGI Lab
Game Environment – 2048:
• Introduction: 2048 is a single-player sliding block puzzle game. The game's objective is to slide numbered tiles on a grid to combine them to create a tile with the number 2048.
• Actions: Up, Down, Left, Right
• Reward: The score is the value of new tile when two tiles are combined.
• A sample of two-step state transition
Implementation Details:
Network Architecture
• -tuple patterns: 4 × 6-tuples with all possible isomorphisms
Training Arguments
• Learning rate: 0.1
Learning rate for features of -tuple network with features: 0.1 ÷
• Train the network 500k ~ 1M episodes
2
Deep Learning and Practice 2021 Spring; NYCU CGI Lab
Algorithm:
A pseudocode of the game engine and training. (modified backward training method)
function PLAY GAME
← 0
← INITIALIZE GAME STATE
while IS NOT TERMINAL STATE( ) do
← argmax EVALUATE( , ’)
′∈ ( )
, ′, ′′ ← MAKE MOVE( , )
SAVE RECORD( , , , ’, ’’)
← +
← ′′
for ( , , , ’, ’’) FROM TERMINAL DOWNTO INITIAL do LEARN EVALUATION( , , , ’, ’’)
return
function MAKE MOVE( , )
′, ← COMPUTE AFTERSTATE( , )
′′ ← ADD RANDOM TILE( ′)
return ( , ′, ′′)
TD-state
function EVALUATE( , )
′, ← COMPUTE AFTERSTATE( , )
′′ ← ALL POSSIBLE NEXT STATES( ′)
return + Σ ′′∈ ′′ ( , , ′′) ( ′′)
function LEARN EVALUATION( , , , ′, ′′)
( ) ← ( ) + ( + ( ′′) − ( ))
TD-after-state
function EVALUATE( , )
′, ← COMPUTE AFTERSTATE( , )
return + ( ′)
function LEARN EVALUATION( , , , ′, ′′)
← argmax ( ′′, ′)
′∈ ( ′′)
′ , ← ( ′′, )
3
Deep Learning and Practice 2021 Spring; NYCU CGI Lab
Rule of Thumb:
• You can design your own -tuple network, but do NOT try CNN.
• 2048-tile should appear within 10,000 episodes.
Scoring Criteria:
Show your work, otherwise no credit will be granted.
• Report (60%)
◦ A plot shows episode scores of at least 100,000 training episodes (10%)
◦ Describe the implementation and the usage of -tuple network. (10%)
◦ Explain the mechanism of TD(0). (5%)
◦ Explain the TD-backup diagram of V(after-state). (5%)
◦ Explain the action selection of V(after-state) in a diagram. (5%)
◦ Explain the TD-backup diagram of V(state). (5%)
◦ Explain the action selection of V(state) in a diagram. (5%)
◦ Describe your implementation in detail. (10%)
◦ Other discussions or improvements. (5%)
• Demo Performance (40%)
◦ The 2048-tile win rate in 1000 games, ⌈winrate2048⌉.(20%)
◦ Questions. (20%)
References:
[1] Szubert, Marcin, and Wojciech Jaśkowski. "Temporal difference learning of N-tuple networks for the game 2048." 2014 IEEE Conference on Computational Intelligence and Games. IEEE, 2014.
[2] Kun-Hao Yeh, I-Chen Wu, Chu-Hsuan Hsueh, Chia-Chuan Chang, Chao-Chin Liang, and Han Chiang, Multi-Stage Temporal Difference Learning for 2048-like Games, accepted by IEEE Transactions on Computational Intelligence and AI in Games (SCI), doi: 10.1109/TCIAIG.2016.2593710, 2016.
[3] Oka, Kazuto, and Kiminori Matsuzaki. "Systematic selection of n-tuple networks for 2048." International Conference on Computers and Games. Springer International Publishing, 2016.
[4] moporgic. “Basic implementation of 2048 in Python.” Retrieved from Github: https://github.com/moporgic/2048-Demo-Python.
[5] moporgic. “Temporal Difference Learning for Game 2048 (Demo).” Retrieved from Github: https://github.com/moporgic/TDL2048-Demo.
[6] lukewayne123. “2048-Framework” Retrieved from Github: https://github.com/lukewayne123/2048-Framework.
4