Starting from:
$35

$29

HW 3 Solution

    • Instructions:

You may form small groups (e.g. of up to four people) to work on this assignment, but you must write up all solutions by yourself. List your study partners for the homework on the rst page, or \none" if you had no partners.

Keep all responses brief, a few sentences at most. Show all work for full credit.

Start each problem on a new page, and be sure to clearly label where each problem and subproblem begins. All problems must be submitted in order (all of P1 before P2, etc.).

No late homeworks will be accepted. This is not out of a desire to be harsh, but rather out of fairness to all students in this large course.














































1
CS 188,      HW 3     


    • Over tting

Over tting is a common problem when doing datascience work.

    (a) How can you tell if a model you have trained is over tting?

    (b) Why would it be bad to deploy a model that’s been over tted to your training data?

        (c) Brie y describe how over tting can occur.

    (d) In lecture and discussion we’ve discussed multiple methods for dealing with over tting. One such technique was regularization. Please describe two di erent regularization techniques and explain how they help to mitigate over tting.

    • Over tting Mitigations

For each of the below strategies, state whether or not it might help to mitigate over tting and explain why.

    (a) Using a smaller training dataset

    (b) Restricting the maximum value any parameter can take on

        (c) Training your neural network for longer (more iterations)

    (d) Training a model with more parameters

        (e) Randomly setting the outputs of 50% of the nodes in your neural network to zero

        (f) Incorporating additional sparse features into your model

        (g) Initializing your parameters randomly instead of to zero

    (h) Training your model on a graphics processing unit or specialized accelerator chip instead of a

CPU

    • K-Nearest Neighbors

    (a) Why is it important to normalize your data when using the k-nearest neighbors algorithm (KNN)?

    (b) When doing k-nearest neighbors, an odd value for k is typically used. Why is this?

        (c) Say you have the dataset in Table 1, where x and y are features and L is the label. Normalize the features by scaling them so that all values in a column lie in the range [0; 1], so that they can be used with KNN.

    (d) Use KNN with K=3 to make predictions for the data points in Table 2.

(e) Use KNN with k = 1 to make predictions for the data points in Table 2.

(f) Use KNN with k = 5 to make predictions for the data points in Table 2.

    (g) What is a potential issue with using a low k for KNN?

    (h) What is a potential issue with selecting a k that is too high when doing KNN?




2
CS 188,      HW 3     


x    y    L

0.2    350    0

0.1    750    0

0.3    700    0

0.1    500    0

0.2    500    0

1.2    400    1

0.9    410    1

1.1    390    1

0.1    760    1

Table 1: Raw KNN data for training

x    y    Lpred

.1    760

.2    700

.6    200

Table 2: Raw KNN data for evaluation


    (i) Say you had a dataset and wanted to understand how well KNN with a k of 3 performed on it. How could you quantify its performance? Assume you do not have access to any samples beyond those in your dataset.

    (j) Say you had a dataset consisting of 200 samples. How might you select the optimal k to use for KNN?

    • Principal Component Analysis

For each of the below situations, state whether or not PCA would work well, and brie y explain why.

    (a) Data with a linear structure

    (b) Data lying on a hyperbolic plane

        (c) A dataset containing non-normalized features

(d) A dataset where each feature is statistically independent of all others



















3
CS 188,      HW 3     


    • Arti cial Neural Networks

Consider the following computation graph for a simple neural network for binary classi cation. Here x1 and x2 are input features and y is their associated class. The network has multiple parameters, including weights w and biases b, as well as non-linearity function g. The network will output a value ypred, representing the probability that a sample belongs to class 1. We use a loss function Loss to help train our model. The network is initialized with the parameters in Table 3.

You will    rst train the model using some sample datapoints and then evaluate its performance.

For any questions that ask for performance metrics, generate them using the samples in Table 4.

w0


b0

w4
w1    +
 g

x1                             


+  g  ypred  Loss
x2                                                      y
 g      b2
w2    +
w5

b1
w3

Figure 1: Neural network architecture


    (a) Initialize the neural network with the parameters in Table 3 and then train it using the samples in Table 4. Use gradient descent (forward pass followed by backpropagation) to update the parameters.

Suppose the loss function is quadratic, Loss(y; ypred) =
1
(y   ypred)2, and g is the sigmoid

2



1




function g(z) =



(note: it’s typically better to use a di erent type of loss, cross-entropy,

1+e
z

for classi cation problems, but using a quadratic loss function keeps the math easier for this assignment).

Assume that your learning rate    = :1 .

Pass the samples through the network one at a time in order from top to bottom. Report the nal values of all parameters. Show all work.

    (b) What is your model’s accuracy?

(c) What is your model’s precision?

    (d) What is your model’s recall?

(e) What is your model’s F1 score?


4
CS 188,  

HW 3


 










Parameter
Initial value
















b0
1







b1
-6







b2
-3.93







w0
3







w1
4







w2
6


x1
x2
y


w3
5
0
0
0


w4
2
1
1
1


w5
4
0
1
1


Table 3: ANN initial state

Table 4: ANN training data

    (f) Plot the ROC curve.

    (g) What is your model’s AUC-ROC?

    • Reinforcement Learning

Consider the following instance of reinforcement learning


3    +1

2    -1

    • Start

1    2    3    4

Figure 2: Reinforcement learning grid world

Assume the following:

    • The agent lives in a grid

    • The agent starts in the bottom left

    • Walls (shaded grids) block the agent’s path

    • The agent’s actions do not always go as planned:

{ 80% of the time, the action North takes the agent North (if there is no wall there) { 10% of the time, North takes the agent West

{ 10% of the time, North takes the agent East

{ If there is a wall in the direction the agent would have taken, the agent stays put

    • = :8

    • Big rewards come at the end

    • Goal: maximize sum of rewards

Use Q-learning to    nd an optimal path in the grid. Show each step. State all your assumptions.


5
CS 188,      HW 3     


    • (Money)ball So Hard

Watch the 2011 movie Moneyball (directed by Bennett Miller and written by Steven Zaillian and Aaron Sorkin).

    (a) Write a detailed formulation of the problem discussed in the movie and outline, step-by-step, how you would go about solving the problem (data collection, data cleaning, ML, and all other steps).

    (b) Discuss what can go wrong when using this method in practice and how it can be improved.





















































6

More products