Starting from:
$30

$24

Assignment #2 Solution

Submit a pdf through ConneX before 11:55pm. I will accept submissions up to 24 hours late for a 10% reduction in your mark. After that point, no submissions will be accepted.




Show your work for all questions.




Different marking schemes will be used for undergrad (SEng 474) and grad (CSc 578D) students. Undergrad students do not have to answer the grad questions.




All code questions use the python/numpy/pytorch/matplotlib. You may install these libraries on your own computer, or you can work in the lab.




For each question you should hand the attached code skeleton, including your implementations in it. Include graphs and written answers in your pdf.




1: Neural Networks




(SEng 474; CSc 578D: 20 points)




Use the attached code skeleton circle.py to complete this question.




Using PyTorch, build a network that can properly classify a point in 2D space as being inside or outside of the unit circle. See skeleton code provided. Plot the percent correct on training data against iteration number. [10 pts]



Generate a new test batch and run your final model on it. Plot the points in 2D and color them based on their predicted label. We suggest using the provided plot_decision_boundary function. [5 pts]



The simple neural network architecture provided in the skeleton code cannot properly predict for this task. Why? Plotting the predicted labels (as in b) for the simple architecture may help you to answer this question. [5 pts]












2. Gradient Descent: Linear Regression




(SENG 474: 30 points, CSC578D: 40 points. SEng 474; CSc 587D: 5 Bonus points)




In this part of the assignment we will implement a linear regression model, using the Boston houses dataset. We will implement two techniques for minimizing the cost function: batch gradient descent, and stochastic gradient descent.




You are provided with a skeleton python program linreg.py and you have to implement the missing parts.




First let’s recall some theory and mathematical notation1. We express our dataset as a matrix X 2 Rn m where each row is a training sample and each column represents a feature, and a vector y 2 Rn of target values. In order to make the following notation easier, we defined a training sample as vector as x = [1; x1; x2; : : : ; xm]. Basically we add a column of ones to X. The purpose of linear regression is to estimate the vector of parameters w = [w0; w1; : : : ; wm] such that the hyper-plane x wT fits, in an optimal way, our training samples. Once we have obtained w we can predict the value of testing sample, xtest, simply by plugging it into the equation y^ = xtest wT . We estimate w by minimizing a cost function over the entire dataset, which is defined as follows

n

E(w) = 21n X yi xi wT 2




i=1




we apply gradient descent techniques to find the parameters vector w that minimizes the cost function E(w). This is achieved by an iterative algorithm that starts with an initial guess for w and repeatedly performs the update




wj := wj
@
E(w) j = 0; 1; : : : ; m
@wj






where is the learning rate parameter.




In this question you are asked to implement some gradient descent techniques. We are going to set a maximum number of iteration, max_iter in the code, and analyze what happens to the cost function at each iteration by plotting the error function at each iteration.




SEng 474; CSc 587D: 30 pts




Implement the batch gradient descent algorithm. Plot the cost function for each iteration. Test it with a few different learning rates. E.g. = 0:001, 0:01, 0:1. Explain what is happening. [15 pts]



Implement the stochastic gradient descent algorithm. Plot the cost function with different learning rate. Compare it with the batch gradient descent. [15 pts]






Matrices are represented by a capital letter, vectors with a lower case bold letter.















CSc 587D only, 10 pts




For the code above, we used a fixed learning rate. For this question, you should research alternative methods for setting the learning rate. Describe in detail one method that uses a learning rate schedule, and one method that is adaptive. Ex-plain not only the update rule, but the intuition behind why the approach works. [5 pts]



SEng 474; CSc 587D: Bonus Question, 5 pts




Now we introduce L2 regularization. Explain why it is used. Implement it and repeat (b) with different values of i.e 10, 100, 1000. Remember that with regularization the cost function becomes





0
1 n


y(i) x(i) wT
2m
1
E(w) =
1
wj2
2
n i=1
+ j=1




@


X


X
A








Note that by convention we don’t regularize w0. [5 pts]




3. Gradient Descent: Logistic Regression




(SENG 474; CSC578D: 15 points)




For this question, we will implement logistic regression, and test it using the breast cancer dataset and the code skeleton in logreg.py. For this question, you only need to implement batch gradient descent. This time, plot the log likelihood function for each iteration:




L(w) = X yi(w0xi) X ln(1 + ew0 xi ) (1)

i i




Test it with a few different learning rates. E.g. = 0:001, 0:01, 0:1. Explain what is happening. [15 pts]

More products