$29
Objectives
The objective is to solve the three problems associated with an HMM with a discrete observation probability distribution for a sequence of observations that you will generate yourself!
As usual, you can use any statistical package, such as MatLab, R, SAS, or use Python with all the available statistical functions, or functions from different packages.
Task 1: Data Set Generation
Generate a set of observations by simulating an HMM consisting of 4 states and 3 objects, i.e., = 1,2,3,4, = 1,2,3. For this, set the initial vector = 1,0,0,0, and generate your personalized one-step transition probability matrix and event matrix :
00
01
02
03
0(1)
0(2)
0(3)
=
10
11
12
13
, =
1(1)
1(2)
1(3)
,
20
12
22
23
2
2
2
30
13
32
33
(1)
(2)
(3)
3
3
3
(1)
(2)
(3)
As described below.
We first describe how to generate the one-step transition probabilities 78 and the probabilities 7( ), and then we describe how to simulate the resulting HMM in order to generate a sequence of observations that you will use in the subsequent tasks.
Use your student ID number as the seed to the pseudo-random number generator that you will use in the assignment.
Generation of the 78 probabilities
• Draw a pseudo-random number , and set 00 = . Repeat the same for the remaining probabilities on the same row, i.e., 01, 02, and 03. Normalize the resulting probabilities, by dividing each by the sum 00 + 01 + 02 + 03, so that they all add up to 1. That is, set 00 = 00/( 00 + 01 + 02 + 03), 01 = 01/( 00 + 01 + 02 + 03), 02 = 02/( 00 + 01 + 02 + 03), and 03 = 03/( 00 + 01 + 02 + 03).
• Repeat the same for the remaining rows of .
Generation of the 7( ) probabilities
• Draw a pseudo-random number , and set 01 = . Repeat the same for the remaining probabilities on the same row 0 2 , 0 3 . Normalize the resulting probabilities, by dividing
each
,
sum
01+0
2 + 0
3 .
That is,
set
0
1 = 01/( 0
1
+ 0
2 +
by the
+ 0 2
+ 0
3/( 01 + 0
2
+ 0
3 ).
0 3 )
0 2
= 02/( 01
3), 03
= 0
• Repeat the same for the remaining rows of .
Generation of the sequence of observations .
Let us assume that you generated the following values for and :
0.4
0.4
0.1
0.1
and
0.8
0.1
0.1
=
0.2
0.3
0.3
0.2
,
=
0.3
0.4
0.3
0.1
0.1
0.4
0.4
0.1
0.2
0.7 .
0.4
0.3
0.2
0.1
0.2
0.5
0.3
Use the following steps to generate .
= 1:
• The HMM is in state 0 = 1, since we assumed that = 1,0,0,0.
• Draw a pseudo-random number and use the probabilities 0, = 1,2,3 to determine observation 0. That is:
o if ≤ 0.8 then 0 = 1
o If 0.8 < ≤ 0.9 then 0 = 2. o Otherwise, 0 = 3.
= 2:
• The HMM shifts to state , = 1,2,3,4. For this, draw a pseudo-random number and use the branching probabilities corresponding to state 1 to determine the next state 1. That is:
o If ≤ 0.4 then 1 = 1.
o If 0.4 < ≤ 0.8 then 1 = 2. o If 0.8 < ≤ 0.9 then 1 = 3. o Otherwise, 1 = 4.
• Having determined the state, use matrix to determine 1 as described above for = 1.
= 3:
• Assuming that you determined that the HMM shifts to state at time = 2, draw a pseudo-random number and use the branching probabilities corresponding to state to determine the next state 2 as described above for = 2.
• Having determined the next state, use matrix to determine 2 as described above for = 1.
≥ 4
• Repeat above steps for = 3 until you have generated 1000 observations, that is, = 1000. Remember to also record the generated sequence of states.
Task 2: Estimate ( | )
• Given that you know the parameters of the HMM, calculate the probability ( | ) that the sequence of observations = 123312331233 came from the HMM.
• Discuss your results.
Task 3: Estimate the Most Probable Sequence
• Given that you know the parameters of the HMM, calculate the most probable sequence of
states that gave rise to the sequence of observations = 12331233123
• Discuss your results.
Task 4: Train the HMM
• Now forget that you know the parameters of the HMM, and use an HMM solver to estimate its parameters = ( , , ) using the 1000 generated observations, assuming that the HMM has four states, i.e., = 1,2,3,4, and that the number of objects is 3, i.e., = 1,2,3.
• Give the estimated parameters in one or more tables along with their corresponding -values. Compare your results against the actual parameters = ( , , ) of the HMM used to generate the data.
• Discuss your results.
What to submit
You will submit your report along with the code that you wrote to obtain the results. It is very important that you provide enough results to support your conclusions. Conclusions without insufficient results will make you lose grades. Remember: no code sharing!
How to submit your work
Submit your report and code in a single .zip file (not .tar or .7z, or any other format). Your report should should be a single pdf file that contains all your graphs, tables and conclusions.
Grading
The TA will first verify that your code works and produces the results you submitted. The break down of the grades will be as follows:
Task 1: 30 points
Task 2: 20 points
Task 3: 20 points
Task4: 30 points
Extra credit
Assume now that you do not know the number of states of the HMM, but you know that the number of objects is 3, i.e., = 1,2,3. Train different HMMs each with a different number of states, starting with an HMM with two states, and for each HMM calculate the likelihood, AIC, and BIC. Plot these three quantities as a function of the number of states. Keep increasing the number of states until you begin to discern a pattern in each of the three plots. Select the best HMM. Discuss your results.
Note that the number of parameters increases as the number of states increases. A rule of thumb is that for each parameter, you need at least 10 observations. As you increase the number of states,
you may require more than 1000 observations. In this case, simply generate additional observations as described above.
How to submit your extra credit
Submit your report and any additional code to the one you used above in tasks 1 to 4, in a single .zip file (not .tar or .7z, or any other format). Your report should should be a single pdf file.
Submit your .zip file separately from project 5, where it says: “Submit your extra credit work”.
Grading of extra credit
This task will be graded separately from the above assignment from 0 to 100. The grade will then be transformed to the range [0,3], and it will be added to your final grade for the class. For instance, if you get a 3 for the extra credit and you class grade is 83, then your final grade will be 86. This extra credit typically will push your final grade up one +/- bracket, but not always. See also syllabus for the +/- grading scale.
Remember that you will be graded mostly on your ability to interpret the results