Starting from:
$29.99

$23.99

Machine Learning Homework #2 Solution

1.  (40  points)  Let  X   = {x1 , . . . , xn } be  a  set  of n  samples  drawn  i.i.d.  from  an  univariate distribution with  density  function  p(x|θ),  where θ is an unknown  parameters.  In general  θ will belong to  a specified subset  of R, the  set  of reals.   For  the  following choices of p(x|θ), derive the maxmimum likelihood estimate  of θ based on the samples X :1

 



|
 
(a)  (10 points) p(x θ) =  √ 1 

2πθ


 



exp  −  x    , θ 0.
 
2

2θ2

(b)  (10 points)  p(x|θ) = 1 exp  − x   , 0 ≤ x < ∞, θ 0.

θ                  θ



θ
 
(c)  (10 points)  p(x|θ) = θxθ−1  , 0 ≤ x ≤ 1, 0 < θ < ∞. (d)  (10 points)  p(x|θ) = 1  , 0 ≤ x ≤ θ, θ 0.

 

2.  (20 points) Let X  = {x1, . . . , xn }, xi ∈ Rd be a set of n samples drawn i.i.d. from a multivariate Gaussian  distribution in Rd  with  mean µ and  covariance  matrix  Σ.  Recall that the  density function  of a multivariate Gaussian  distribution is given by:

          1      


1             T    −1

p(x|µ, Σ) = (2π)d/2


 

|Σ|


1/2  exp


− 2 (x − µ)  Σ


(x − µ)    .

 

(a)  (10 points) Derive the maximum  likelihood estimates  for the mean µ and covariance  Σ

based on the sample set X .12

(b)  (5 points)  Let µˆn  be the estimate of the mean.  Is µˆn  a biased estimate  of the true  mean

µ? Clearly justify your answer by computing  E[µˆn ].

(c)  (5 points)  Let  Σˆ n  be the  estimate  of the  mean.   Is Σˆ n  a biased  estimate  of the  true covariance  Σ?  Clearly justify your answer by computing  E[Σˆ n ].

 

Programming assignment:

The next problem involve programming. For Question  3, we will be using the 2-class classifica- tion datasets from Boston50, Boston75, and the 10-class classification dataset from Digits which were used in Homework 1.

 

3.  (40 points)  We will develop  a parametric classifier by modeling  each  class conditional  dis- tribution p(x|Ci ) as a multivariate Gaussian.   In particular, using the  training  data,  we will estimate  estimate  the  class prior  probabilities p(Ci ) and  the  class conditional  probabilities p(x|Ci ) based  on the  estimated mean  µˆi  and  the  estimated covariance  Σˆ i for each class Ci . The classification will be done based on the following discriminant function:

 

gi (x) = log p(Ci ) + log p(x|Ci ) .

 

1  You have  to show the  details  of your  derivation. A correct  answer  without the  details  will not  get any  credit.

2 You can use material from the  Matrix Cookbook for your  derivation.

We will develop code for MultiGaussClassify with corresponding  MultiGaussClassify.fit(X,y) and  MultiGaussClassify.predict(X) functions.   Parameters for each class should  be ini- tialized  to be zero mean and identity covariance,  i.e., µi = 0 and Σi = I.

We will compare  the  performance  of MultiGaussClassify with  LogisticRegression3   on three  datasets:  Boston50, Boston75, and  Digits.  Using my cross val with  5-fold cross- validation, report  the error  rates  in each fold as well as the mean and standard deviation  of error rates across folds for the two methods:  MultiGaussClassify and LogisticRegression, applied  to the three  classification datasets: Boston50, Boston75, and Digits.

You will have to submit  (a) code and (b) summary of results:

 

(a)  Code:  You will have  to submit  code for MultiGaussClassify() as well as a wrapper code q3().

For  MultiGaussClassify(), you are encouraged  to consult  the  code for LinearSVC (or

LogisticRegression)  in scikit-learn to  build  your  code for MultiGaussClassify.4

You need to make sure you have    init , fit, and predict implemented in MultiGaussClassify. If you consult  the code from scikit-learn, keep in mind that your code does not have

to exactly  follow the structure used in scikit-learn (e.g., class inheritance) since your code is expected  to be much simpler than  that. For example,  for Logistic Regression in sklearn, you can look for the class definition of LogisticRegression in logistic.py, and the function  definition  of    init (), fit(), etc.  within  that class.  This is just  to give you some hints  on the code structure and some parameters, inputs  and outputs you may  need to  consider  in your  own code for a new class MultiGaussClassify().  Your class will NOT inherit  any base class in sklearn. Again, the  three  functions  you must implement in this class are    init , fit, and predict.

The wrapper code (main  file) has no input  and  is used to prepare  the  datasets, and make  calls to  my cross val(method,X ,y,k)  to  generate  the  error  rate  results  for each dataset and each method.  The code for my cross val(method,X ,y,k)  must be developed by yourself (e.g., code you made in HW1 with modifications as needed) and you cannot use cross val score() in sklearn. The results should be printed  to terminal  (not generating an additional file in the folder).  Make sure the calls to my cross val(method,X ,y,k)  are made  in the  following order  and  add  a print to  the  terminal  before each  call to  show which method  and dataset is being used:

1.   MultiGaussClassify with  Boston50; 2.   MultiGaussClassify with  Boston75; 3. MultiGaussClassify with Digits, 4. LogisticRegression with Boston50; 5. LogisticRegression with Boston75; 6. LogisticRegression with Digits.

*For the  wrapper  code, you need to make a q3.py file for it, and  one should be able to run your code by calling ”python q3.py” in command  line window.

(b)  Summary of results:  For each dataset and each method,  report  the test  set error rates for each of the k = 5 folds, the mean error rate over the k folds, and the standard deviation

 

3 You should  use LogisticRegression from scikit-learn, similar  to HW1.

4 The  exact  location   of the  python code  will  depend   on  your  implementation.   For  example,   if you  are  using

Anaconda, look for LinearSVC code in Anaconda3\Lib\site-packages\sklearn\svm\classes.py and  LogisticRegression in

Anaconda3\Lib\site-packages\sklearn\linear model\logistic.py. Your  code should  be considerably simpler  than these  examples.

of the  error  rates  over the  k folds.  Make a table  to present the  results  for each method and each dataset (6 tables  in total). Each column of the table  represents a fold and add two columns at the end to show the overall mean error rate  and standard deviation  over the k folds.

 

Additional instructions:  Code can only be written  in Python (not IPython notebook);  no other programming languages  will be accepted.    One  should  be able  to  execute  all programs  directly from command  prompt  (e.g.,  ”python q3.py”) without  the  need to run  Python interactive shell first.  Test  your code yourself before submission  and  suppress  any warning  messages that may be printed.  Your  code must  be run  on a CSE  lab  machine  (e.g.,  csel-kh1260-01.cselabs.umn.edu). Please make sure you specify the version of Python you are using as well as instructions on how to run your program  in the README  file (must  be readable  through  a text  editor  such as Notepad). Information on the  size of the  datasets, including  number  of data  points  and  dimensionality of features,  as well as number  of classes can be readily extracted from the datasets in scikit-learn. Each  function  must  take  the  inputs  in the  order  specified in the  problem  and  display  the  output via the terminal  or as specified.

For each part,  you can submit  additional files/functions (as needed)  which will be used by the main file. Please put  comments  in your code so that one can follow the key parts  and steps in your code.

Follow the rules strictly. If we  cannot run your code, you  will  not get any credit.

 

•  Things to submit

 

1.  hw2.pdf:  A document which contains  the solution  to Problems  1, 2, and 3 including the summary  of results  for 3. This document must  be in PDF  format  (no word, photo,  etc. is accepted). If you submit  a scanned  copy of a hand-written document, make sure the copy is clearly readable,  otherwise  no credit  may be given.

2.  Python code for Problem  3 (must  include the required  q3.py).

3.  README.txt: README  file that contains  your name,  student ID, email, instructions on how to run  your code, the  full Python version your are using, any assumptions you are making, and any other  necessary details.  The file must  be readable  by a text  editor such as Notepad.

4.  Any other  files, except the data,  which are necessary for your code.

More products