Assignment 2 Solution

Starting from:

~~$30~~

$24

Home

Instructions: Submit two files only: writeup.pdf and code.zip. Only question 1 and 4.2 are programming questions; written answers are expected for all other questions. For all questions, show all your work, including intermediate steps.

1 Linear Regression [20 pts]

Using any programming language, implement linear least square regression (i.e., with no polynomial terms) with the penalty term wT w. Download the dataset called regression-dataset.zip from the course webpage, which contains the training data and labels for each fold. The input and output spaces are continuous (i.e., x 2 Rd and y 2 R). Find the best by 10-fold cross validation. Draw a graph that shows the euclidean error as

increases from 0 to 4 in increments of 0.1.

2 Maximum Likelihood and Maximum a Posteriori Estimates [20 pts]

2.1 MLE [10 pts]

Consider the probability density function below:

P (x) = 3 xe x2

where is a parameter and x is a positive real number. Suppose you get m i.i.d samples xi drawn from this distribution. Show how one can compute the maximum likelihood estimate for based on these samples.

2.2 MAP [10 pts]

Consider the gamma probability density function below:

P ; (x) = x 1e x

)

Show that the gamma distribution is a conjugate prior of the exponential distribution P (x) = e x.

Find the maximum a posteriori estimate (MAP) of as a function of and .

3 Generative vs Discriminative Models [20 pts]

In class, we learned that when X = hX1; X2; : : : ; Xni where each Xi is distributed normally, and Y is Boolean, then logistic regression is the discriminative equivalent of Naive Bayes.

Prove that P (Y jX) takes the logistic form when X is a vector of boolean variables, i.e., logistic regression is also the discriminative counterpart of a Naive Bayes classifier over boolean features.

4 Support Vector Machines [40 pts]

4.1 Constrained Optimization [20 pts]

Consider the following optimization problem, with variable x 2 R.

minimize x2 + 1

subject to (x 2)(x 4) 0

What is the optimal point x and optimal value p of this constrained optimization problem?

State the Langrangian and the dual function g( ).
State the Langrange dual optimization problem. Find the dual optimal point and the dual optimal value d . Does strong duality hold? Explain your answer.

1
4.2 SVM with Kernels [20 pts]

The Ionosphere dataset from the UCI Machine Learning Repository contains radar data that was collected by a system in Goose Bay, Labrador. Received signals were processed using an autocorrelation function whose arguments are the time of a pulse and the pulse number. There were 17 pulse numbers. Instances in this dataset are described by 2 attributes per pulse number, corresponding to the complex values returned by the function resulting from the complex electromagnetic signal. The prediction is either g=“good” or b=“bad”. Download the dataset called ionosphere.csv from the course webpage.

Use your favourite machine learning toolkit to compare the performance of the SVM using each of the following kernels — linear, polynomial and radial basis. The polynomial and radial basis kernels each take parameters. The hyper-parameter C controls the tradeoff between training error and margin. Experiment with various combinations of the hyper-parameter C = f1; 10; 100g and the kernel parameters.

Explain your procedures and report the training accuracy performance. Which kernel performed the best?

What does this say, if anything, about the data?

2