Starting from:
$29.99

$23.99

Problem Set #6 Solution

• Feel free to  talk  to  other  members  of the  class in doing the  homework.   I am more  concerned  that you learn  how to  solve the  problem  than  that you demonstrate that you solved it  entirely  on your own.  You should,  however,  write  down your  solution  yourself.  Please  try  to keep the  solution  brief and clear.

 

• Please  use Piazza  first if you have  questions  about  the  homework.   Also feel free to send us e-mails and come to office hours.

 

• Please,  no handwritten solutions.  You will submit  your solution  manuscript as a single pdf file.

 

• The  homework  is  due  at  11:59 PM  on  the  due  date.    We  will be  using  Compass  for  collecting the  homework  assignments.    Please  submit   your  solution  manuscript  as  a  pdf  file via  Compass (http://compass2g.illinois.edu). Please do NOT hand  in a hard  copy of your write-up.  Contact the TAs if you are having  technical  difficulties in submitting the assignment.

 

• No code is needed  for any of these  problems.   You can do the  calculations  however you please.  You need to turn  in only the report.  Please name your report  as hNetIDi-hw6.pdf.

 

1. [Na¨ıve Bayes and Learning Threshold Functions - 25 points]

 

Consider the Boolean function fT H (4,9) .  This is a threshold  function defined on the 9 dimensional  Boolean cube as follows: given an instance  x, fT H (4,9) (x) = 1 if and only if 4 or more of x’s components  are 1.  The 9-dimensional Boolean cube is a function f : {0, 1}9 → {0, 1}.

 

(a)  [5 points] Show that  fT H (4,9)  has a linear decision surface over the 9 dimensional

Boolean cube.

 

(b)  [10  points] Assume that  you are given data  sampled  according  to the  uniform distribution over the Boolean cube {0, 1}9  and labeled according to fT H (4,9) . Use na¨ıve Bayes to learn a hypothesis that  predicts these labels. What  is the hypoth- esis generated  by the  na¨ıve  Bayes algorithm?   (You may assume that  you have seen all the data  required to get accurate  estimates  of the probabilities).

 

(c)  [5 points] Show that  the final hypothesis in (b) does not represent this function. (d)  [5 points] Are the na¨ıve Bayes assumptions  satisfied by fT H (4,9) ? Justify.

2. [Multivariate  Poisson na¨ıve  Bayes -  30  points] In  this  question,  we consider the  problem  of classifying piazza  posts  (Y ) into  two  categories:  student posts  (A), and  instructor posts  (B).   For every post,  we have two  attributes:  number  of words (X1 ),  and  number  of mathematical symbols  (X2).    We  assume  that  each  attribute (Xi , i = 1, 2) is related  to a post  category  (A/B)  via a Poisson distribution1  with a particular mean (λA /λB ). That  is

i      i

 

e−λA        A  x


 

−λB      B   x

 

P r[Xi  = x|Y  = A] =


          i  (λi )       and     P r[X  = x Y  = B] =



i        |
 
x!


e   i  (λi )   for i = 1, 2

x!



X1
X2
Y
0

4

2

6

3

2

5
3

8

4

2

5

1

4
A

A A B B B B
 

Table 1: Dataset  for Poisson na¨ıve Bayes

 

 

 

Assume that  the given data  in Table  1 is generated  by a Poisson na¨ıve Bayes model. You will use this data to develop a na¨ıve Bayes predictor over the Poisson distribution.

 

 

 

 



Pr(Y = A) =
Pr(Y = B) =
λA
B
λA
B
 
 
1   =                                 λ1   =

2   =                                 λ2   =

 

Table 2: Parameters for Poisson na¨ıve Bayes

 

 

 

 

(a)  [10  points] Compute  the  prior probabilities  and  parameter values, i.e., fill out

Table 2. [Hint: Use MLE to compute  the λ’s]

 

(b)  [10 points] Based on the parameter values from Table 2, compute

 

Pr(X1 = 2, X2 = 3 | Y = A) Pr(X1 = 2, X2 = 3 | Y = B)

 

(c)  [5 points] Derive an algebraic expression for the Poisson na¨ıve Bayes predictor for Y  in terms of the parameters estimated  from the data.

(d)  [5 points] Use the parameters estimated  from the data given in Table 1 to create a Poisson na¨ıve  Bayes classifier.  What  will the  classifier predict  as the  value of Y , given the data  point:  X1 = 2, X2 = 3?

 

3. [Na¨ıve Bayes over Multinomial Distribution - 35 points]

 

In this question, we will look into training a na¨ıve Bayes classifier with a model that  uses a multinomial  distribution to represent documents.  Assume that  all the documents are written  in a language which has only three words a, b, and c. All the documents  have exactly n words (each word can be either a, b, or c). We are given a labeled document collection {D1, D2 , . . . , Dm }. The label yi of document Di is 1 or 0, indicating whether Di is “good” or “bad”.

 

1 http://en.wikipedia.org/wiki/Poisson  distribution

This  model uses the  multinominal  distribution in the  following way:  Given  the  ith document Di , we denote  by ai  (respectively,  bi , ci ) the number  of times that  word a (respectively,  b, c) appears  in Di . Therefore, ai + bi  + ci = |Di | = n.  We define

   n!    


 

ai    bi     ci

Pr(Di |y = 1) = a !b !c ! α1  β1  γ1

i   i   i

 

where α1  (respectively, β1, γ1 ) is the probability  that  word a (respectively, b, c) appears in a “good” document.  Therefore, α1  + β1 + γ1  = 1. Similarly,

   n!    


 

ai    bi     ci

Pr(Di |y = 0) = a !b !c ! α0  β0  γ0

i   i   i

 

where α0  (respectively, β0, γ0 ) is the probability  that  word a (respectively, b, c) appears in a “bad” document.  Therefore, α0  + β0 + γ0  = 1.

 

(a)  [2 points] What  information  do we lose when we represent documents  using the aforementioned  model?

(b)  [5 points] Write  down the expression for the log likelihood of the document Di , log Pr(Di , yi ). Assume that  the prior probability,  P r(yi = 1) is θ.

 

(c)  [28 points] Derive the expression for the maximum  likelihood estimates  for pa- rameters  α1 , β1, γ1, α0, β0, and γ0.

 

Submission note: You need not show the derivation  of all six parameters separately. Some parameters are symmetric  to others,  and so, once you derive the expression for one, you can directly write down the expression for others.

 

Grading note: 8 points for the derivation  of one of the parameters, 4 points each for the remaining five parameter expressions.

 

4. [Dice  Roll  - 10 points]

 

Consider a scheme to generate a series of numbers as follows: For each element in the series, first a dice is rolled.  If it  comes up as one of the  numbers  from 1 to 5, it  is shown to the user.  On the other  hand,  if the dice roll comes up as 6, then  the dice is rolled for the second time, and the outcome of this roll is shown to the user.

 

Assume that  the probability  of a dice roll coming up as 6 is p.  Also assume if a dice roll doesn’t come up as 6, then the remaining numbers are equally likely. Suppose you see a sequence 3463661622 generated  based on the scheme given above.  What  is the most likely value of p for this given sequence?

More products