Starting from:
$30

$24

Assignment 2 Solution

Instructions: Submit two files only: writeup.pdf and code.zip. Only question 1 and 4.2 are programming questions; written answers are expected for all other questions. For all questions, show all your work, including intermediate steps.




1 Linear Regression [20 pts]




Using any programming language, implement linear least square regression (i.e., with no polynomial terms) with the penalty term wT w. Download the dataset called regression-dataset.zip from the course webpage, which contains the training data and labels for each fold. The input and output spaces are continuous (i.e., x 2 Rd and y 2 R). Find the best by 10-fold cross validation. Draw a graph that shows the euclidean error as

increases from 0 to 4 in increments of 0.1.




2 Maximum Likelihood and Maximum a Posteriori Estimates [20 pts]




2.1 MLE [10 pts]




Consider the probability density function below:




P (x) = 3 xe x2




where is a parameter and x is a positive real number. Suppose you get m i.i.d samples xi drawn from this distribution. Show how one can compute the maximum likelihood estimate for based on these samples.




2.2 MAP [10 pts]




Consider the gamma probability density function below:




P ; (x) = x 1e x







)




Show that the gamma distribution is a conjugate prior of the exponential distribution P (x) = e x.



Find the maximum a posteriori estimate (MAP) of as a function of and .



3 Generative vs Discriminative Models [20 pts]




In class, we learned that when X = hX1; X2; : : : ; Xni where each Xi is distributed normally, and Y is Boolean, then logistic regression is the discriminative equivalent of Naive Bayes.




Prove that P (Y jX) takes the logistic form when X is a vector of boolean variables, i.e., logistic regression is also the discriminative counterpart of a Naive Bayes classifier over boolean features.




4 Support Vector Machines [40 pts]




4.1 Constrained Optimization [20 pts]




Consider the following optimization problem, with variable x 2 R.




minimize x2 + 1

subject to (x 2)(x 4) 0







What is the optimal point x and optimal value p of this constrained optimization problem?



State the Langrangian and the dual function g( ).
State the Langrange dual optimization problem. Find the dual optimal point and the dual optimal value d . Does strong duality hold? Explain your answer.






1
4.2 SVM with Kernels [20 pts]




The Ionosphere dataset from the UCI Machine Learning Repository contains radar data that was collected by a system in Goose Bay, Labrador. Received signals were processed using an autocorrelation function whose arguments are the time of a pulse and the pulse number. There were 17 pulse numbers. Instances in this dataset are described by 2 attributes per pulse number, corresponding to the complex values returned by the function resulting from the complex electromagnetic signal. The prediction is either g=“good” or b=“bad”. Download the dataset called ionosphere.csv from the course webpage.




Use your favourite machine learning toolkit to compare the performance of the SVM using each of the following kernels — linear, polynomial and radial basis. The polynomial and radial basis kernels each take parameters. The hyper-parameter C controls the tradeoff between training error and margin. Experiment with various combinations of the hyper-parameter C = f1; 10; 100g and the kernel parameters.




Explain your procedures and report the training accuracy performance. Which kernel performed the best?




What does this say, if anything, about the data?












































































































































































2

More products