Starting from:
$35

$29

Homework Solution

This homework is intended for you to test your preparation for this class. We will record your answers and mark that you have handed this in, but it will not be fully graded. It will also test that you have have gured out the software environment needed for this class:




you have installed Matlab and can write simple programs therein; you can access the Canvas environment and submit assignments.




Problem 1: Linear regression learns a linear function of feature variables X to t the responses y. In this problem, you will derive the closed-form solution for linear regression formulations.




1. The standard linear regression can be formulated as solving a least square problem




minimize (w) = kXw yk22 = hXw y; Xw yi

w




where X 2 Rn m (n m) represents the feature matrix, y 2 Rn 1 represents the response vector and w 2 Rm 1 is the vector variable of the linear coe cients. Here the i j-th element of X, denoted xij , is the j-th attribute value for the i-th data sample (observation) and yi is the true response for the i-th data sample. This is a convex objective function of w. Derive the optimal w by setting the gradient of the function wrt w to zero to minimize the objective function. To nd the gradient, you can use the following formula
















T X(w + )


2 [X(w + )]T y + yT y
(w +


) =
[X(w +


)]


T




T
X ;




=
(w) + 2 [X ]


[Xw y] + (X )





and note that w must be determined so that (w + ) (w) for any possible vector

(why?). Here XT denotes the transpose of X.




 
In practice, a L2-norm regularizer is often introduced with the least squares, called Ridge Regression, to overcome ill-posed problems where the hessian matrix is not positive de nite. The objective function of ridge regression is de ned as

w
(w) = kXw yk


+ kwk


=
p
I
w
0
2
2
2




~






X


y




minimize






















































where 0 and I is an m m identity matrix. This objective function is strictly convex. Derive the solution of the ridge regression problem to nd the optimal w.



























Problem 2: Consider a coin with probability of heads equal to Pr(H) = p and probability of tails Pr(T) = 1 p. You toss it 5 times and get outcomes H,H,T,T,H.




 
What is the probability of observing the sequence H,H,T,T,H in ve tosses. Also give the formula for the natural logarithm of this probability. Your formulas should be a function of p.




 
You have a box containing exactly 2 coins, one fair with p = 1=2 and one biased with p = 2=3. You choose one of these two coins at random with equal probability, toss it 5 times and get the outcome H,H,T,T,H.




 
Give the joint probability that the coin chosen was the fair coin (p = 1=2) and the outcome was H,H,T,T,H.




 
Give the joint probability that the coin chosen was the biased coin (p = 2=3) and the outcome was H,H,T,T,H.




 
What should the bias p = Pr(H) be to maximize the probability of observing H,H,T,T,H, and what is the corresponding probability of observing H,H,T,T,H (i.e., what is the maximum likelihood estimate for p), assuming p were unknown? Show the derivation. Hint: maximize the log of the function.




Problem 3: Below is the pseudo-code of perceptron algorithm for binary classi cation, where (xt; rt) is the t-th data sample: xt is the vector of attribute values (real numbers) and rt = 1 is the class label for the t-th sample:




 
w = w0.

 
Do Iterate until convergence

 
For each sample (xt; rt), t = 1; 2;




 
If rthw; xti 0

 
w = w + rtxt




Here \convergence" means w does not change at all over one pass through the entire training dataset in the loop starting in step 3.




 
Implement the perceptron algorithm and test it on the provided data. To begin, do \load data1.mat" to load the le the data le into MATLAB. X 2 R40 2 is the feature matrix of 40 samples in 2 dimensions and r 2 R40 1 is the label vector ( 1). Use initial value w0 = [1; 1]T . Now, run your perceptron algorithm on the given data. How many iterations does it take to converge?




 
Visualize all the samples (use 2 di erent colors for the 2 di erent classes) and plot the decision boundary de ned by the initial w0. Plot the decision boundary de ned by the




w returned by the perceptron program.




Hint: To visualize the samples you could use the MATLAB function call scatter(X(:,1), X(:,2), 50, y, '*');







2



Type help scatter for more information. Plotting the boundary is equivalent to plot-ting the line wT x = w1x1 + w2x2 = 0. Since all the sample points are located within the square f(x1; x2); 1 x1; x2 +1g, choose two points (a; b) and (c; d) by setting a = 1; c = +1 and solving for b; d, or else set b = 1; d = +1 and solving for a; c, and then draw the line between the two points (a; b) and (c; d) with the command




hold on; plot([a,c],[b,d]); hold off;




Use the hold function to add the line to the existing scatter plot, and axis to adjust the axes, if needed. Draw both the initial boundary and the nal boundary on the same plot.







Submission




Things to submit:




 
hw0 sol.pdf: a document contains all the derivations of Problem 1 & 2 and the plot asked by Problem 3.




 
MyPerceptron.m: a MATLAB function de ned with header function [w, step]=MyPerceptro




y, w0), where X is the feature matrix, y is a label vector ( 1) and w0 is the ini-tial value for the parameter vector w. In the output, w is the parameter found by perceptron and step represents the number of steps the algorithm takes to converge. The function should also display the plot of samples and boundary.




 
Zip both les into a single zipped le and name it as your lastname.zip.




Submit: All material must be submitted electronically via Canvas. This homework will not be graded but required as a proof of satisfying the prerequisites for taking the class.



















3

More products