CNN-Layers Solution

Starting from:

$35

0.1 Convolutional neural network layers

In this notebook, we will build the convolutional neural network layers. This will be followed by a spatial batchnorm, and then in the final notebook of this assignment, we will train a CNN to further improve the validation accuracy on CIFAR-10.

CS231n has built a solid API for building these modular frameworks and training them, and we will use their very well implemented framework as opposed to “reinventing the wheel.” This includes using their Solver, various utility functions, their layer structure, and their implementation of fast CNN layers. This also includes nndl.fc_net, nndl.layers, and nndl.layer_utils. As in prior assignments, we thank Serena Yeung & Justin Johnson for permission to use code written for the CS 231n class (cs231n.stanford.edu).

[2]: ## Import and setups

import time

import numpy as np

import matplotlib.pyplot as plt

from nndl.conv_layers import *

from cs231n.data_utils import get_CIFAR10_data

from cs231n.gradient_check import eval_numerical_gradient,␣ ,→eval_numerical_gradient_array
from cs231n.solver import Solver

%matplotlib inline

plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots plt.rcParams['image.interpolation'] = 'nearest' plt.rcParams['image.cmap'] = 'gray'

• for auto-reloading external modules

• see http://stackoverflow.com/questions/1907993/ ,→autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

def rel_error(x, y):

""" returns relative error """

return np.max(np.abs(x - y) / (np.maximum(1e-8, np.abs(x) + np.abs(y))))

1
0.2 Implementing CNN layers

Just as we implemented modular layers for fully connected networks, batch normalization, and dropout, we’ll want to implement modular layers for convolutional neural networks. These layers are in nndl/conv_layers.py.

0.2.1 Convolutional forward pass

Begin by implementing a naive version of the forward pass of the CNN that uses for loops. This function is conv_forward_naive in nndl/conv_layers.py. Don’t worry about eﬀiciency of imple-mentation. Later on, we provide a fast implementation of these layers. This version ought to test your understanding of convolution. In our implementation, there is a triple for loop.

After you implement conv_forward_naive, test your implementation by running the cell below.

[3]: x_shape = (2, 3, 4, 4)

w_shape = (3, 3, 4, 4)

• = np.linspace(-0.1, 0.5, num=np.prod(x_shape)).reshape(x_shape) w = np.linspace(-0.2, 0.3, num=np.prod(w_shape)).reshape(w_shape) b = np.linspace(-0.1, 0.2, num=3)

conv_param = {'stride': 2, 'pad': 1}

out, _ = conv_forward_naive(x, w, b, conv_param) correct_out = np.array([[[[-0.08759809, -0.10987781],
[-0.18387192, -0.2109216 ]],

[[ 0.21027089, 0.21661097],

• 0.22847626, 0.23004637]], [[ 0.50813986, 0.54309974],
• 0.64082444, 0.67101435]]],

[[[-0.98053589, -1.03143541],

[-1.19128892, -1.24695841]],

[[ 0.69108355, 0.66880383],

◦ 0.59480972, 0.56776003]], [[ 2.36270298, 2.36904306],
◦ 2.38090835, 2.38247847]]]])

• Compare your output to ours; difference should be around 1e-8 print('Testing conv_forward_naive')
print('difference: ', rel_error(out, correct_out))

Testing conv_forward_naive

difference: 2.2121476417505994e-08

0.2.2 Convolutional backward pass

Now, implement a naive version of the backward pass of the CNN. The function is conv_backward_naive in nndl/conv_layers.py. Don’t worry about eﬀiciency of implementa-tion. Later on, we provide a fast implementation of these layers. This version ought to test your understanding of convolution. In our implementation, there is a quadruple for loop.

2
After you implement conv_backward_naive, test your implementation by running the cell below.

[4]: x = np.random.randn(4, 3, 5, 5)

w = np.random.randn(2, 3, 3, 3)

b = np.random.randn(2,)

dout = np.random.randn(4, 2, 5, 5)

conv_param = {'stride': 1, 'pad': 1}

out, cache = conv_forward_naive(x,w,b,conv_param)

dx_num = eval_numerical_gradient_array(lambda x: conv_forward_naive(x, w, b,␣ ,→conv_param)[0], x, dout)

dw_num = eval_numerical_gradient_array(lambda w: conv_forward_naive(x, w, b,␣ ,→conv_param)[0], w, dout)

db_num = eval_numerical_gradient_array(lambda b: conv_forward_naive(x, w, b,␣ ,→conv_param)[0], b, dout)

out, cache = conv_forward_naive(x, w, b, conv_param)

dx, dw, db = conv_backward_naive(dout, cache)

# Your errors should be around 1e-9'

print('Testing conv_backward_naive function')

print('dx error: ', rel_error(dx, dx_num))

print('dw error: ', rel_error(dw, dw_num))

print('db error: ', rel_error(db, db_num))

Testing conv_backward_naive function

dx error: 1.0

dw error: 2.8070873679848205e-10

db error: 2.2766771005533667e-11

0.2.3 Max pool forward pass

In this section, we will implement the forward pass of the max pool. The function is max_pool_forward_naive in nndl/conv_layers.py. Do not worry about the eﬀiciency of im-plementation.

After you implement max_pool_forward_naive, test your implementation by running the cell below.

[5]: x_shape = (2, 3, 4, 4)

• = np.linspace(-0.3, 0.4, num=np.prod(x_shape)).reshape(x_shape) pool_param = {'pool_width': 2, 'pool_height': 2, 'stride': 2}

out, _ = max_pool_forward_naive(x, pool_param)

correct_out = np.array([[[[-0.26315789, -0.24842105], [-0.20421053, -0.18947368]],

[[-0.14526316, -0.13052632],

3

[-0.08631579, -0.07157895]],

[[-0.02736842, -0.01263158],

• 0.03157895, 0.04631579]]],

[[[ 0.09052632, 0.10526316],

• 0.14947368, 0.16421053]], [[ 0.20842105, 0.22315789],
• 0.26736842, 0.28210526]], [[ 0.32631579, 0.34105263],
[ 0.38526316, 0.4 ]]]])

• Compare your output with ours. Difference should be around 1e-8. print('Testing max_pool_forward_naive function:') print('difference: ', rel_error(out, correct_out))

Testing max_pool_forward_naive function:

difference: 4.1666665157267834e-08

0.2.4 Max pool backward pass

In this section, you will implement the backward pass of the max pool. The function is max_pool_backward_naive in nndl/conv_layers.py. Do not worry about the eﬀiciency of imple-mentation.

After you implement max_pool_backward_naive, test your implementation by running the cell below.

[6]: x = np.random.randn(3, 2, 8, 8)

dout = np.random.randn(3, 2, 4, 4)

pool_param = {'pool_height': 2, 'pool_width': 2, 'stride': 2}

dx_num = eval_numerical_gradient_array(lambda x: max_pool_forward_naive(x,␣ ,→pool_param)[0], x, dout)

out, cache = max_pool_forward_naive(x, pool_param)

dx = max_pool_backward_naive(dout, cache)

# Your error should be around 1e-12

print('Testing max_pool_backward_naive function:')

print('dx error: ', rel_error(dx, dx_num))

Testing max_pool_backward_naive function:

dx error: 3.2756387806169813e-12

0.3 Fast implementation of the CNN layers

Implementing fast versions of the CNN layers can be diﬀicult. We will provide you with the fast layers implemented by cs231n. They are provided in cs231n/fast_layers.py.

The fast convolution implementation depends on a Cython extension; to compile it you need to run the following from the cs231n directory:

4
python setup.py build_ext --inplace

NOTE: The fast implementation for pooling will only perform optimally if the pooling regions are non-overlapping and tile the input. If these conditions are not met then the fast pooling implementation will not be much faster than the naive implementation.

You can compare the performance of the naive and fast versions of these layers by running the cell below.

You should see pretty drastic speedups in the implementation of these layers. On our machine, the forward pass speeds up by 17x and the backward pass speeds up by 840x. Of course, these numbers will vary from machine to machine, as well as on your precise implementation of the naive layers.

[7]: from cs231n.fast_layers import conv_forward_fast, conv_backward_fast from time import time

x = np.random.randn(100, 3, 31, 31)

• = np.random.randn(25, 3, 3, 3) b = np.random.randn(25,)
dout = np.random.randn(100, 25, 16, 16) conv_param = {'stride': 2, 'pad': 1}

t0 = time()

out_naive, cache_naive = conv_forward_naive(x, w, b, conv_param)

t1 = time()

out_fast, cache_fast = conv_forward_fast(x, w, b, conv_param)

t2 = time()

print('Testing conv_forward_fast:')

print('Naive: %fs' % (t1 - t0))

print('Fast: %fs' % (t2 - t1))

print('Speedup: %fx' % ((t1 - t0) / (t2 - t1)))

print('Difference: ', rel_error(out_naive, out_fast))

t0 = time()

dx_naive, dw_naive, db_naive = conv_backward_naive(dout, cache_naive)

t1 = time()

dx_fast, dw_fast, db_fast = conv_backward_fast(dout, cache_fast)

t2 = time()

print('\nTesting conv_backward_fast:')

print('Naive: %fs' % (t1 - t0))

print('Fast: %fs' % (t2 - t1))

print('Speedup: %fx' % ((t1 - t0) / (t2 - t1)))

print('dx difference: ', rel_error(dx_naive, dx_fast))

print('dw difference: ', rel_error(dw_naive, dw_fast))

print('db difference: ', rel_error(db_naive, db_fast))

Testing conv_forward_fast:

5
Naive: 7.645994s

Fast: 0.014226s

Speedup: 537.476729x

Difference: 2.3164423283468067e-11

Testing conv_backward_fast:

Naive: 10.050777s

Fast: 0.012206s

Speedup: 823.440082x

dx difference: 1.0

dw difference: 1.456656519088377e-11

db difference: 0.0

[8]: from cs231n.fast_layers import max_pool_forward_fast, max_pool_backward_fast

x = np.random.randn(100, 3, 32, 32)

dout = np.random.randn(100, 3, 16, 16)

pool_param = {'pool_height': 2, 'pool_width': 2, 'stride': 2}

t0 = time()

out_naive, cache_naive = max_pool_forward_naive(x, pool_param)

t1 = time()

out_fast, cache_fast = max_pool_forward_fast(x, pool_param)

t2 = time()

print('Testing pool_forward_fast:')

print('Naive: %fs' % (t1 - t0))

print('fast: %fs' % (t2 - t1))

print('speedup: %fx' % ((t1 - t0) / (t2 - t1)))

print('difference: ', rel_error(out_naive, out_fast))

t0 = time()

dx_naive = max_pool_backward_naive(dout, cache_naive)

t1 = time()

dx_fast = max_pool_backward_fast(dout, cache_fast)

t2 = time()

print('\nTesting pool_backward_fast:')

print('Naive: %fs' % (t1 - t0))

print('speedup: %fx' % ((t1 - t0) / (t2 - t1)))

print('dx difference: ', rel_error(dx_naive, dx_fast))

Testing pool_forward_fast:

Naive: 0.527140s

fast: 0.006076s

speedup: 86.759771x

difference: 0.0

6
Testing pool_backward_fast:

Naive: 1.583779s

speedup: 89.626550x

dx difference: 0.0

0.4 Implementation of cascaded layers

We’ve provided the following functions in nndl/conv_layer_utils.py: - conv_relu_forward - conv_relu_backward - conv_relu_pool_forward - conv_relu_pool_backward

These use the fast implementations of the conv net layers. You can test them below:

[12]: from nndl.conv_layer_utils import conv_relu_pool_forward,␣ ,→conv_relu_pool_backward

x = np.random.randn(2, 3, 16, 16)

w = np.random.randn(3, 3, 3, 3)

b = np.random.randn(3,)

dout = np.random.randn(2, 3, 8, 8)

conv_param = {'stride': 1, 'pad': 1}

pool_param = {'pool_height': 2, 'pool_width': 2, 'stride': 2}

out, cache = conv_relu_pool_forward(x, w, b, conv_param, pool_param)

dx, dw, db = conv_relu_pool_backward(dout, cache)

dx_num = eval_numerical_gradient_array(lambda x: conv_relu_pool_forward(x, w,␣ ,→b, conv_param, pool_param)[0], x, dout)
dw_num = eval_numerical_gradient_array(lambda w: conv_relu_pool_forward(x, w,␣ ,→b, conv_param, pool_param)[0], w, dout)

db_num = eval_numerical_gradient_array(lambda b: conv_relu_pool_forward(x, w,␣ ,→b, conv_param, pool_param)[0], b, dout)

print('Testing conv_relu_pool')

print('dx error: ', rel_error(dx_num, dx))

print('dw error: ', rel_error(dw_num, dw))

print('db error: ', rel_error(db_num, db))

Testing conv_relu_pool

dx error: 1.1765335829016865e-08

dw error: 9.132958239361986e-10

db error: 7.33615166792925e-12

[13]: from nndl.conv_layer_utils import conv_relu_forward, conv_relu_backward

x = np.random.randn(2, 3, 8, 8)

w = np.random.randn(3, 3, 3, 3)

b = np.random.randn(3,)

dout = np.random.randn(2, 3, 8, 8)

7

conv_param = {'stride': 1, 'pad': 1}

out, cache = conv_relu_forward(x, w, b, conv_param)

dx, dw, db = conv_relu_backward(dout, cache)

dx_num = eval_numerical_gradient_array(lambda x: conv_relu_forward(x, w, b,␣ ,→conv_param)[0], x, dout)

dw_num = eval_numerical_gradient_array(lambda w: conv_relu_forward(x, w, b,␣ ,→conv_param)[0], w, dout)

db_num = eval_numerical_gradient_array(lambda b: conv_relu_forward(x, w, b,␣ ,→conv_param)[0], b, dout)

print('Testing conv_relu:')

print('dx error: ', rel_error(dx_num, dx))

print('dw error: ', rel_error(dw_num, dw))

print('db error: ', rel_error(db_num, db))

Testing conv_relu:

dx error: 1.1425877178093623e-09

dw error: 1.9940987496613484e-10

db error: 1.991382404345138e-11

0.5 What next?

We saw how helpful batch normalization was for training FC nets. In the next notebook, we’ll implement a batch normalization for convolutional neural networks, and then finish off by imple-menting a CNN to improve our validation accuracy on CIFAR-10.

1 conv_layers.py

• ]: import numpy as np

from nndl.layers import *

import pdb

"""

This code was originally written for CS 231n at Stanford University

(cs231n.stanford.edu). It has been modified in various areas for use in the

ECE 239AS class at UCLA. This includes the descriptions of what code to

implement as well as some slight potential changes in variable names to be

consistent with class nomenclature. We thank Justin Johnson & Serena Yeung for

permission to use this code. To see the original version, please visit

cs231n.stanford.edu.

"""

def conv_forward_naive(x, w, b, conv_param):

"""

A naive implementation of the forward pass for a convolutional layer.

8

The input consists of N data points, each with C channels, height H and width W. We convolve each input with F different filters, where each filter spans all C channels and has height HH and width HH.

Input:

• x: Input data of shape (N, C, H, W)

• w: Filter weights of shape (F, C, HH, WW)

• b: Biases, of shape (F,)

• conv_param: A dictionary with the following keys:

◦ 'stride': The number of pixels between adjacent receptive fields in the horizontal and vertical directions.

◦ 'pad': The number of pixels that will be used to zero-pad the input.

Returns a tuple of:

• out: Output data, of shape (N, F, H', W') where H' and W' are given by H' = 1 + (H + 2 * pad - HH) / stride

W' = 1 + (W + 2 * pad - WW) / stride

• cache: (x, w, b, conv_param)

"""

out = None

pad = conv_param['pad']

stride = conv_param['stride']

# ================================================================ #

• YOUR CODE HERE:

• Implement the forward pass of a convolutional neural network.

• Store the output as 'out'.

• Hint: to pad the array, you can use the function np.pad.

• ================================================================ #

x_padded = np.pad(x, ((0,0), (0,0), (pad, pad), (pad, pad)), 'constant')

N, _, H, W = x_padded.shape

F, _, HH, WW = w.shape

H_out = int(1 + (H - HH) / stride)

W_out = int(1 + (W - WW) / stride)

out = np.zeros((N, F, H_out, W_out))

for pt_idx in range(N):

for filter_idx in range(F):

for x_idx in range(W_out):

for y_idx in range(H_out):

x_start = x_idx * stride

y_start = y_idx * stride

9

patch = x_padded[pt_idx, :, y_start:y_start+HH, x_start:

,→x_start+WW]

convolved = np.sum(np.multiply(patch, w[filter_idx])) +␣
,→b[filter_idx]

out[pt_idx, filter_idx, y_idx, x_idx] = convolved

# ================================================================ #

• END YOUR CODE HERE

• ================================================================ #

cache = (x, w, b, conv_param)

return out, cache

def conv_backward_naive(dout, cache):

"""

A naive implementation of the backward pass for a convolutional layer.

Inputs:

• dout: Upstream derivatives.

• cache: A tuple of (x, w, b, conv_param) as in conv_forward_naive

Returns a tuple of:

• dx: Gradient with respect to x

• dw: Gradient with respect to w

• db: Gradient with respect to b

"""

dx, dw, db = None, None, None

N, F, out_height, out_width = dout.shape

x, w, b, conv_param = cache

stride, pad = [conv_param['stride'], conv_param['pad']]

xpad = np.pad(x, ((0,0), (0,0), (pad,pad), (pad,pad)), mode='constant') num_filts, _, f_height, f_width = w.shape

# ================================================================ #

• YOUR CODE HERE:

• Implement the backward pass of a convolutional neural network.

• Calculate the gradients: dx, dw, and db.

• ================================================================ #

dx = np.zeros(xpad.shape)

dw = np.zeros(w.shape)

db = np.zeros(b.shape)

for pt_idx in range(N):

10

for filter_idx in range(F):

db[filter_idx] += np.sum(dout[pt_idx, filter_idx])

for x_idx in range(out_width):

for y_idx in range(out_height):

x_start = x_idx * stride

y_start = y_idx * stride

patch = xpad[pt_idx, :, y_start:y_start+f_height, x_start:

,→x_start+f_width]

dx[filter_idx, :, y_start:y_start+f_height, x_start:

,→x_start+f_width] += w[filter_idx] + dout[pt_idx, filter_idx, y_idx, x_idx]

dw[filter_idx] += patch * dout[pt_idx, filter_idx, y_idx,␣
,→x_idx]

dx = dx[:, :, pad:pad+x.shape[2], pad:pad+x.shape[3]]

# ================================================================ #

• END YOUR CODE HERE

• ================================================================ #

return dx, dw, db

def max_pool_forward_naive(x, pool_param):

"""

A naive implementation of the forward pass for a max pooling layer.

Inputs:

• x: Input data, of shape (N, C, H, W)

• pool_param: dictionary with the following keys:

◦ 'pool_height': The height of each pooling region

◦ 'pool_width': The width of each pooling region

◦ 'stride': The distance between adjacent pooling regions

Returns a tuple of:

• out: Output data

• cache: (x, pool_param)

"""

out = None

# ================================================================ #

• YOUR CODE HERE:

• Implement the max pooling forward pass.

• ================================================================ #

N, C, H, W = x.shape

11

pool_height, pool_width, stride = [pool_param['pool_height'],␣ ,→pool_param['pool_width'], pool_param['stride']]

H_out = int(1 + (H-pool_param['pool_height']) / pool_param['stride'])

W_out = int(1 + (W-pool_param['pool_width']) / pool_param['stride'])

out = np.zeros((N, C, H_out, W_out))

for pt_idx in range(N):

for channel_idx in range(C):

for y_idx in range(H_out):

for x_idx in range(W_out):

x_start = x_idx * stride

y_start = y_idx * stride

out[pt_idx, channel_idx, y_idx, x_idx] = np.max(x[pt_idx,␣ ,→channel_idx, y_start:y_start+pool_height, x_start:x_start+pool_width])

# ================================================================ #

• END YOUR CODE HERE

• ================================================================ # cache = (x, pool_param)
return out, cache

def max_pool_backward_naive(dout, cache):

"""

A naive implementation of the backward pass for a max pooling layer.

Inputs:

• dout: Upstream derivatives

• cache: A tuple of (x, pool_param) as in the forward pass.

Returns:

• dx: Gradient with respect to x

"""

dx = None

x, pool_param = cache

pool_height, pool_width, stride = pool_param['pool_height'],␣

,→pool_param['pool_width'], pool_param['stride']

# ================================================================ #

• YOUR CODE HERE:

• Implement the max pooling backward pass.

• ================================================================ #

N, C, H, W = dout.shape

pool_height, pool_width, stride = [pool_param['pool_height'],␣ ,→pool_param['pool_width'], pool_param['stride']]
dx = np.zeros(x.shape)

12

for pt_idx in range(N):

for channel_idx in range(C):

for y_idx in range(H):

for x_idx in range(W):

x_start = x_idx * stride

y_start = y_idx * stride

patch = x[pt_idx, channel_idx, y_start:y_start+pool_height,␣ ,→x_start:x_start+pool_width]
dx[pt_idx, channel_idx, y_start:y_start+pool_height, x_start:

,→x_start+pool_width] += (patch == np.max(patch)) * dout[pt_idx, channel_idx,␣

,→y_idx, x_idx]

# ================================================================ #

• END YOUR CODE HERE

• ================================================================ #

return dx

def spatial_batchnorm_forward(x, gamma, beta, bn_param):

"""

Computes the forward pass for spatial batch normalization.

Inputs:

• x: Input data of shape (N, C, H, W)

• gamma: Scale parameter, of shape (C,)

• beta: Shift parameter, of shape (C,)

• bn_param: Dictionary with the following keys:

◦ mode: 'train' or 'test'; required

◦ eps: Constant for numeric stability

◦ momentum: Constant for running mean / variance. momentum=0 means that old information is discarded completely at every time step, while momentum=1 means that new information is never incorporated. The default of momentum=0.9 should work well in most situations.
◦ running_mean: Array of shape (D,) giving running mean of features

◦ running_var Array of shape (D,) giving running variance of features

Returns a tuple of:

• out: Output data, of shape (N, C, H, W)

• cache: Values needed for the backward pass

"""

out, cache = None, None

# ================================================================ #

• YOUR CODE HERE:

• Implement the spatial batchnorm forward pass.

•

13

• You may find it useful to use the batchnorm forward pass you

• implemented in HW #4.

• ================================================================ #

N, C, H, W = x.shape

x = x.transpose(0, 2, 3, 1).reshape((-1, C))

out, cache = batchnorm_forward(x, gamma, beta, bn_param)

out = out.reshape((N, H, W, C)).transpose(0, 3, 1, 2)

# ================================================================ #

• END YOUR CODE HERE

• ================================================================ #

return out, cache

def spatial_batchnorm_backward(dout, cache):

"""

Computes the backward pass for spatial batch normalization.

Inputs:

• dout: Upstream derivatives, of shape (N, C, H, W)

• cache: Values from the forward pass

Returns a tuple of:

• dx: Gradient with respect to inputs, of shape (N, C, H, W)

• dgamma: Gradient with respect to scale parameter, of shape (C,)

• dbeta: Gradient with respect to shift parameter, of shape (C,)

"""

dx, dgamma, dbeta = None, None, None

# ================================================================ #

• YOUR CODE HERE:

• Implement the spatial batchnorm backward pass.

•

• You may find it useful to use the batchnorm forward pass you

• implemented in HW #4.

• ================================================================ #

N, C, H, W = dout.shape

dout = dout.transpose(0, 2, 3, 1).reshape((-1, C))

dx, dgamma, dbeta = batchnorm_backward(dout, cache)

dx = dx.reshape((N, H, W, C)).transpose(0, 3, 1, 2)

# ================================================================ #

• END YOUR CODE HERE

• ================================================================ #

14

return dx, dgamma, dbeta

15

CNN-BatchNorm

February 25, 2021

0.1 Spatial batch normalization

In fully connected networks, we performed batch normalization on the activations. To do something equivalent on CNNs, we modify batch normalization slightly.

Normally batch-normalization accepts inputs of shape (N, D) and produces outputs of shape (N, D), where we normalize across the minibatch dimension N. For data coming from convolutional layers, batch normalization accepts inputs of shape (N, C, H, W) and produces outputs of shape (N, C, H, W) where the N dimension gives the minibatch size and the (H, W) dimensions give the spatial size of the feature map.

How do we calculate the spatial averages? First, notice that for the C feature maps we have (i.e., the layer has C filters) that each of these ought to have its own batch norm statistics, since each feature map may be picking out very different features in the images. However, within a feature map, we may assume that across all inputs and across all locations in the feature map, there ought to be relatively similar first and second order statistics. Hence, one way to think of spatial batch-normalization is to reshape the (N, C, H, W) array as an (N*H*W, C) array and perform batch normalization on this array.

Since spatial batch norm and batch normalization are similar, it’d be good to at this point also copy and paste our prior implemented layers from HW #4. Please copy and paste your prior implemented code from HW #4 to start this assignment. If you did not correctly implement the layers in HW #4, you may collaborate with a classmate to use their implementations from HW #4. You may also visit TA or Prof OH to correct your implementation.

You’ll want to copy and paste from HW #4: - layers.py for your FC network layers, as well as batchnorm and dropout. - layer_utils.py for your combined FC network layers. - optim.py for your optimizers.

Be sure to place these in the nndl/ directory so they’re imported correctly. Note, as announced in class, we will not be releasing our solutions.

If you use your prior implementations of the batchnorm, then your spatial batchnorm implemen-tation may be very short. Our implementations of the forward and backward pass are each 6 lines of code.

CS231n has built a solid API for building these modular frameworks and training them, and we will use their very well implemented framework as opposed to “reinventing the wheel.” This includes using their Solver, various utility functions, their layer structure, and their implementation of fast CNN layers. This also includes nndl.fc_net, nndl.layers, and nndl.layer_utils. As in prior assignments, we thank Serena Yeung & Justin Johnson for permission to use code written for the CS 231n class (cs231n.stanford.edu).

1

[1]: ## Import and setups

import time

import numpy as np

import matplotlib.pyplot as plt

from nndl.conv_layers import *

from cs231n.data_utils import get_CIFAR10_data

from cs231n.gradient_check import eval_numerical_gradient,␣ ,→eval_numerical_gradient_array
from cs231n.solver import Solver

%matplotlib inline

plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots plt.rcParams['image.interpolation'] = 'nearest' plt.rcParams['image.cmap'] = 'gray'

• for auto-reloading external modules

• see http://stackoverflow.com/questions/1907993/ ,→autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

def rel_error(x, y):

""" returns relative error """

return np.max(np.abs(x - y) / (np.maximum(1e-8, np.abs(x) + np.abs(y))))

0.2 Spatial batch normalization forward pass

Implement the forward pass, spatial_batchnorm_forward in nndl/conv_layers.py. Test your implementation by running the cell below.

[8]: # Check the training-time forward pass by checking means and variances

# of features both before and after spatial batch normalization

N,C,H,W=2,3,4,5

x = 4 * np.random.randn(N, C, H, W) + 10

print('Before spatial batch normalization:')

print(' Shape: ', x.shape)

print(' Means: ', x.mean(axis=(0, 2, 3)))

print(' Stds: ', x.std(axis=(0, 2, 3)))

• Means should be close to zero and stds close to one gamma, beta = np.ones(C), np.zeros(C)
bn_param = {'mode': 'train'}

out, _ = spatial_batchnorm_forward(x, gamma, beta, bn_param) print('After spatial batch normalization:')

2

print(' Shape: ', out.shape)

print(' Means: ', out.mean(axis=(0, 2, 3)))

print(' Stds: ', out.std(axis=(0, 2, 3)))

• Means should be close to beta and stds close to gamma gamma, beta = np.asarray([3, 4, 5]), np.asarray([6, 7, 8]) out, _ = spatial_batchnorm_forward(x, gamma, beta, bn_param)

print('After spatial batch normalization (nontrivial gamma, beta):') print(' Shape: ', out.shape)

print(' Means: ', out.mean(axis=(0, 2, 3)))

print(' Stds: ', out.std(axis=(0, 2, 3)))

Before spatial batch
normalization:

Shape:
(2,
3, 4,
5)

Means:
[ 9.7936955 10.91239001
9.61946536]
Stds:
[3.3976253
4.04053375 3.87928827]
After spatial
batch
normalization:

Shape: (2, 3, 4, 5)

Means: [ 6.55031585e-16 5.10702591e-16 -2.92821323e-16]

Stds: [0.99999957 0.99999969 0.99999967]

After spatial batch normalization (nontrivial gamma, beta):

Shape: (2, 3, 4, 5)

Means: [6. 7. 8.]

Stds: [2.9999987 3.99999877 4.99999834]

0.3 Spatial batch normalization backward pass

Implement the backward pass, spatial_batchnorm_backward in nndl/conv_layers.py. Test your implementation by running the cell below.

[10]: N, C, H, W = 2, 3, 4, 5

x = 5 * np.random.randn(N, C, H, W) + 12

gamma = np.random.randn(C)

beta = np.random.randn(C)

dout = np.random.randn(N, C, H, W)

bn_param = {'mode': 'train'}

fx = lambda x: spatial_batchnorm_forward(x, gamma, beta, bn_param)[0]

fg = lambda a: spatial_batchnorm_forward(x, gamma, beta, bn_param)[0]

fb = lambda b: spatial_batchnorm_forward(x, gamma, beta, bn_param)[0]

dx_num = eval_numerical_gradient_array(fx, x, dout)

da_num = eval_numerical_gradient_array(fg, gamma, dout)

db_num = eval_numerical_gradient_array(fb, beta, dout)

_, cache = spatial_batchnorm_forward(x, gamma, beta, bn_param)

dx, dgamma, dbeta = spatial_batchnorm_backward(dout, cache)

print('dx error: ', rel_error(dx_num, dx))

3

print('dgamma error: ', rel_error(da_num, dgamma))

print('dbeta error: ', rel_error(db_num, dbeta))

dx error: 1.818218710195061e-08

dgamma error: 3.277344768869509e-12

dbeta error: 3.88469904364782e-12

1 conv_layers.py

• ]: import numpy as np

from nndl.layers import *

import pdb

"""

This code was originally written for CS 231n at Stanford University

(cs231n.stanford.edu). It has been modified in various areas for use in the

ECE 239AS class at UCLA. This includes the descriptions of what code to

implement as well as some slight potential changes in variable names to be

consistent with class nomenclature. We thank Justin Johnson & Serena Yeung for

permission to use this code. To see the original version, please visit

cs231n.stanford.edu.

"""

def conv_forward_naive(x, w, b, conv_param):

"""

A naive implementation of the forward pass for a convolutional layer.

The input consists of N data points, each with C channels, height H and width W. We convolve each input with F different filters, where each filter spans all C channels and has height HH and width HH.

Input:

• x: Input data of shape (N, C, H, W)

• w: Filter weights of shape (F, C, HH, WW)

• b: Biases, of shape (F,)

• conv_param: A dictionary with the following keys:

◦ 'stride': The number of pixels between adjacent receptive fields in the horizontal and vertical directions.

◦ 'pad': The number of pixels that will be used to zero-pad the input.

Returns a tuple of:

• out: Output data, of shape (N, F, H', W') where H' and W' are given by H' = 1 + (H + 2 * pad - HH) / stride

W' = 1 + (W + 2 * pad - WW) / stride

• cache: (x, w, b, conv_param)

"""

4

out = None

pad = conv_param['pad']

stride = conv_param['stride']

# ================================================================ #

• YOUR CODE HERE:

• Implement the forward pass of a convolutional neural network.

• Store the output as 'out'.

• Hint: to pad the array, you can use the function np.pad.

• ================================================================ #

x_padded = np.pad(x, ((0,0), (0,0), (pad, pad), (pad, pad)), 'constant')

N, _, H, W = x_padded.shape

F, _, HH, WW = w.shape

H_out = int(1 + (H - HH) / stride)

W_out = int(1 + (W - WW) / stride)

out = np.zeros((N, F, H_out, W_out))

for pt_idx in range(N):

for filter_idx in range(F):

for x_idx in range(W_out):

for y_idx in range(H_out):

x_start = x_idx * stride

y_start = y_idx * stride

patch = x_padded[pt_idx, :, y_start:y_start+HH, x_start:

,→x_start+WW]

convolved = np.sum(np.multiply(patch, w[filter_idx])) +␣
,→b[filter_idx]

out[pt_idx, filter_idx, y_idx, x_idx] = convolved

# ================================================================ #

• END YOUR CODE HERE

• ================================================================ #

cache = (x, w, b, conv_param)

return out, cache

def conv_backward_naive(dout, cache):

"""

A naive implementation of the backward pass for a convolutional layer.

Inputs:

- dout: Upstream derivatives.

5

- cache: A tuple of (x, w, b, conv_param) as in conv_forward_naive

Returns a tuple of:

• dx: Gradient with respect to x

• dw: Gradient with respect to w

• db: Gradient with respect to b

"""

dx, dw, db = None, None, None

N, F, out_height, out_width = dout.shape

x, w, b, conv_param = cache

stride, pad = [conv_param['stride'], conv_param['pad']]

xpad = np.pad(x, ((0,0), (0,0), (pad,pad), (pad,pad)), mode='constant') num_filts, _, f_height, f_width = w.shape

# ================================================================ #

• YOUR CODE HERE:

• Implement the backward pass of a convolutional neural network.

• Calculate the gradients: dx, dw, and db.

• ================================================================ #

dx = np.zeros(xpad.shape)

dw = np.zeros(w.shape)

db = np.zeros(b.shape)

for pt_idx in range(N):

for filter_idx in range(F):

db[filter_idx] += np.sum(dout[pt_idx, filter_idx])

for x_idx in range(out_width):

for y_idx in range(out_height):

x_start = x_idx * stride

y_start = y_idx * stride

patch = xpad[pt_idx, :, y_start:y_start+f_height, x_start:

,→x_start+f_width]

dx[filter_idx, :, y_start:y_start+f_height, x_start:

,→x_start+f_width] += w[filter_idx] + dout[pt_idx, filter_idx, y_idx, x_idx]

dw[filter_idx] += patch * dout[pt_idx, filter_idx, y_idx,␣
,→x_idx]

dx = dx[:, :, pad:pad+x.shape[2], pad:pad+x.shape[3]]

# ================================================================ #

• END YOUR CODE HERE

• ================================================================ #

6

return dx, dw, db

def max_pool_forward_naive(x, pool_param):

"""

A naive implementation of the forward pass for a max pooling layer.

Inputs:

• x: Input data, of shape (N, C, H, W)

• pool_param: dictionary with the following keys:

◦ 'pool_height': The height of each pooling region

◦ 'pool_width': The width of each pooling region

◦ 'stride': The distance between adjacent pooling regions

Returns a tuple of:

• out: Output data

• cache: (x, pool_param)

"""

out = None

# ================================================================ #

• YOUR CODE HERE:

• Implement the max pooling forward pass.

• ================================================================ #

N, C, H, W = x.shape

pool_height, pool_width, stride = [pool_param['pool_height'],␣ ,→pool_param['pool_width'], pool_param['stride']]

H_out = int(1 + (H-pool_param['pool_height']) / pool_param['stride'])

W_out = int(1 + (W-pool_param['pool_width']) / pool_param['stride'])

out = np.zeros((N, C, H_out, W_out))

for pt_idx in range(N):

for channel_idx in range(C):

for y_idx in range(H_out):

for x_idx in range(W_out):

x_start = x_idx * stride

y_start = y_idx * stride

out[pt_idx, channel_idx, y_idx, x_idx] = np.max(x[pt_idx,␣ ,→channel_idx, y_start:y_start+pool_height, x_start:x_start+pool_width])

# ================================================================ #

• END YOUR CODE HERE

• ================================================================ # cache = (x, pool_param)

7

return out, cache

def max_pool_backward_naive(dout, cache):

"""

A naive implementation of the backward pass for a max pooling layer.

Inputs:

• dout: Upstream derivatives

• cache: A tuple of (x, pool_param) as in the forward pass.

Returns:

• dx: Gradient with respect to x

"""

dx = None

x, pool_param = cache

pool_height, pool_width, stride = pool_param['pool_height'],␣

,→pool_param['pool_width'], pool_param['stride']

# ================================================================ #

• YOUR CODE HERE:

• Implement the max pooling backward pass.

• ================================================================ #

N, C, H, W = dout.shape

pool_height, pool_width, stride = [pool_param['pool_height'],␣ ,→pool_param['pool_width'], pool_param['stride']]
dx = np.zeros(x.shape)

for pt_idx in range(N):

for channel_idx in range(C):

for y_idx in range(H):

for x_idx in range(W):

x_start = x_idx * stride

y_start = y_idx * stride

patch = x[pt_idx, channel_idx, y_start:y_start+pool_height,␣ ,→x_start:x_start+pool_width]
dx[pt_idx, channel_idx, y_start:y_start+pool_height, x_start:

,→x_start+pool_width] += (patch == np.max(patch)) * dout[pt_idx, channel_idx,␣

,→y_idx, x_idx]

# ================================================================ #

• END YOUR CODE HERE

• ================================================================ #

return dx

def spatial_batchnorm_forward(x, gamma, beta, bn_param):

8

"""

Computes the forward pass for spatial batch normalization.

Inputs:

• x: Input data of shape (N, C, H, W)

• gamma: Scale parameter, of shape (C,)

• beta: Shift parameter, of shape (C,)

• bn_param: Dictionary with the following keys:

◦ mode: 'train' or 'test'; required

◦ eps: Constant for numeric stability

◦ momentum: Constant for running mean / variance. momentum=0 means that old information is discarded completely at every time step, while momentum=1 means that new information is never incorporated. The default of momentum=0.9 should work well in most situations.
◦ running_mean: Array of shape (D,) giving running mean of features

◦ running_var Array of shape (D,) giving running variance of features

Returns a tuple of:

• out: Output data, of shape (N, C, H, W)

• cache: Values needed for the backward pass

"""

out, cache = None, None

# ================================================================ #

• YOUR CODE HERE:

• Implement the spatial batchnorm forward pass.

•

• You may find it useful to use the batchnorm forward pass you

• implemented in HW #4.

• ================================================================ #

N, C, H, W = x.shape

x = x.transpose(0, 2, 3, 1).reshape((-1, C))

out, cache = batchnorm_forward(x, gamma, beta, bn_param)

out = out.reshape((N, H, W, C)).transpose(0, 3, 1, 2)

# ================================================================ #

• END YOUR CODE HERE

• ================================================================ #

return out, cache

def spatial_batchnorm_backward(dout, cache):

"""

Computes the backward pass for spatial batch normalization.

9

Inputs:

• dout: Upstream derivatives, of shape (N, C, H, W)

• cache: Values from the forward pass

Returns a tuple of:

• dx: Gradient with respect to inputs, of shape (N, C, H, W)

• dgamma: Gradient with respect to scale parameter, of shape (C,)

• dbeta: Gradient with respect to shift parameter, of shape (C,)

"""

dx, dgamma, dbeta = None, None, None

# ================================================================ #

• YOUR CODE HERE:

• Implement the spatial batchnorm backward pass.

•

• You may find it useful to use the batchnorm forward pass you

• implemented in HW #4.

• ================================================================ #

N, C, H, W = dout.shape

dout = dout.transpose(0, 2, 3, 1).reshape((-1, C))

dx, dgamma, dbeta = batchnorm_backward(dout, cache)

dx = dx.reshape((N, H, W, C)).transpose(0, 3, 1, 2)

# ================================================================ #

• END YOUR CODE HERE

• ================================================================ #

return dx, dgamma, dbeta

10

CNN

February 25, 2021

1 Convolutional neural networks

In this notebook, we’ll put together our convolutional layers to implement a 3-layer CNN. Then, we’ll ask you to implement a CNN that can achieve > 65% validation error on CIFAR-10.

CS231n has built a solid API for building these modular frameworks and training them, and we will use their very well implemented framework as opposed to “reinventing the wheel.” This includes using their Solver, various utility functions, their layer structure, and their implementation of fast CNN layers. This also includes nndl.fc_net, nndl.layers, and nndl.layer_utils. As in prior assignments, we thank Serena Yeung & Justin Johnson for permission to use code written for the CS 231n class (cs231n.stanford.edu).

If you have not completed the Spatial BatchNorm Notebook, please see the following description from that notebook:

Please copy and paste your prior implemented code from HW #4 to start this assignment. If you did not correctly implement the layers in HW #4, you may collaborate with a classmate to use their layer implementations from HW #4. You may also visit TA or Prof OH to correct your implementation.

You’ll want to copy and paste from HW #4: - layers.py for your FC network layers, as well as batchnorm and dropout. - layer_utils.py for your combined FC network layers. - optim.py for your optimizers.

Be sure to place these in the nndl/ directory so they’re imported correctly. Note, as announced in class, we will not be releasing our solutions.

[28]: # As usual, a bit of setup

import numpy as np

import matplotlib.pyplot as plt

from nndl.cnn import *

from cs231n.data_utils import get_CIFAR10_data

from cs231n.gradient_check import eval_numerical_gradient_array,␣ ,→eval_numerical_gradient
from nndl.layers import *

from nndl.conv_layers import *

from cs231n.fast_layers import *

from cs231n.solver import Solver

1

%matplotlib inline

plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots plt.rcParams['image.interpolation'] = 'nearest' plt.rcParams['image.cmap'] = 'gray'

• for auto-reloading external modules

• see http://stackoverflow.com/questions/1907993/ ,→autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

def rel_error(x, y):

""" returns relative error """

return np.max(np.abs(x - y) / (np.maximum(1e-8, np.abs(x) + np.abs(y))))

The autoreload extension is already loaded. To reload it, use:

%reload_ext autoreload

[29]: # Load the (preprocessed) CIFAR10 data.

data = get_CIFAR10_data()

for k in data.keys():

print('{}: {} '.format(k, data[k].shape))

X_train: (49000, 3, 32, 32)

y_train: (49000,)

X_val: (1000, 3, 32, 32)

y_val: (1000,)

X_test: (1000, 3, 32, 32)

y_test: (1000,)

1.1 Three layer CNN

In this notebook, you will implement a three layer CNN. The ThreeLayerConvNet class is in nndl/cnn.py. You’ll need to modify that code for this section, including the initialization, as well as the calculation of the loss and gradients. You should be able to use the building blocks you have either earlier coded or that we have provided. Be sure to use the fast layers.

The architecture of this CNN will be:

conv - relu - 2x2 max pool - aﬀine - relu - aﬀine - softmax

We won’t use batchnorm yet. You’ve also done enough of these to know how to debug; use the cells below.

Note: As we are implementing several layers CNN networks. The gradient error can be ex-pected for the eval_numerical_gradient() function. If your W1 max relative error and W2 max relative error are around or below 0.01, they should be acceptable. Other errors should be less than 1e-5.

2

[37]: num_inputs = 2

input_dim = (3, 16, 16)

reg = 0.0

num_classes = 10

X = np.random.randn(num_inputs, *input_dim)

y = np.random.randint(num_classes, size=num_inputs)

model = ThreeLayerConvNet(num_filters=3, filter_size=3, input_dim=input_dim, hidden_dim=7, dtype=np.float64)

loss, grads = model.loss(X, y)

for param_name in sorted(grads):

• = lambda _: model.loss(X, y)[0]

param_grad_num = eval_numerical_gradient(f, model.params[param_name],␣ ,→verbose=False, h=1e-6)
e = rel_error(param_grad_num, grads[param_name]) print('{} max relative error: {}'.format(param_name,␣

,→rel_error(param_grad_num, grads[param_name])))

W1 max relative error: 0.09894537796840001

W2 max relative error: 0.0022220564576085388

W3 max relative error: 0.0005093512347850536

b1 max relative error: 1.1607904538515381e-05

b2 max relative error: 2.529284523196727e-07

b3 max relative error: 6.580369384445221e-10

1.1.1 Overfit small dataset

To check your CNN implementation, let’s overfit a small dataset.

[39]: num_train = 100

small_data = {

'X_train': data['X_train'][:num_train],

'y_train': data['y_train'][:num_train],

'X_val': data['X_val'],

'y_val': data['y_val'],

}

model = ThreeLayerConvNet(weight_scale=1e-2)

solver = Solver(model, small_data,

num_epochs=10, batch_size=50,

update_rule='adam',

optim_config={

'learning_rate': 1e-3,

},

verbose=True, print_every=1)

solver.train()

3

(Iteration
1
/ 20) loss: 2.367147
(Epoch 0 /
10)

train acc: 0.120000; val_acc: 0.108000
(Iteration
2
/ 20) loss: 3.027329
(Epoch 1 /
10)

train acc: 0.230000; val_acc: 0.130000
(Iteration
3
/ 20) loss: 2.305673
(Iteration
4
/ 20) loss: 2.258074
(Epoch 2 /
10)

train acc: 0.400000; val_acc: 0.163000
(Iteration
5
/ 20) loss: 1.941648
(Iteration
6
/ 20) loss: 1.870670
(Epoch 3 /
10)

train acc: 0.410000; val_acc: 0.098000
(Iteration
7
/ 20) loss: 2.008817
(Iteration
8
/ 20) loss: 1.210222
(Epoch 4 /
10) train acc: 0.430000; val_acc: 0.171000
(Iteration
9
/ 20) loss: 2.117600
(Iteration
10
/
20)
loss:
1.535911
(Epoch 5 /
10) train acc:
0.620000; val_acc: 0.179000
(Iteration
11
/
20)
loss:
1.257314
(Iteration
12
/
20)
loss:
0.965287
(Epoch 6 /
10) train acc:
0.620000; val_acc: 0.178000
(Iteration
13
/
20)
loss:
0.910322
(Iteration
14
/
20)
loss:
0.769668
(Epoch 7 /
10) train acc:
0.790000; val_acc: 0.240000
(Iteration
15
/
20)
loss:
0.717222
(Iteration
16
/
20)
loss:
0.624393
(Epoch 8 /
10) train acc:
0.820000; val_acc: 0.223000
(Iteration
17
/
20)
loss:
0.590381
(Iteration
18
/
20)
loss:
0.690635
(Epoch 9 /
10) train acc:
0.860000; val_acc: 0.227000
(Iteration
19
/
20)
loss:
0.590675
(Iteration
20
/
20)
loss:
0.255355
(Epoch 10 / 10)
train acc: 0.910000; val_acc: 0.203000

[41]: plt.subplot(2, 1, 1)

plt.plot(solver.loss_history, 'o')

plt.xlabel('iteration')

plt.ylabel('loss')

plt.subplot(2, 1, 2)

plt.plot(solver.train_acc_history, '-o')

plt.plot(solver.val_acc_history, '-o')

plt.legend(['train', 'val'], loc='upper left')

plt.xlabel('epoch')

plt.ylabel('accuracy')

plt.show()

4

1.2 Train the network

Now we train the 3 layer CNN on CIFAR-10 and assess its accuracy.

[42]: model = ThreeLayerConvNet(weight_scale=0.001, hidden_dim=500, reg=0.001)

solver = Solver(model, data,

num_epochs=1, batch_size=50,

update_rule='adam',

optim_config={

'learning_rate': 1e-3,

},

verbose=True, print_every=20)

solver.train()

(Iteration
1 / 980)
loss: 2.304397
(Epoch 0 /
1)
train
acc: 0.096000; val_acc: 0.088000
(Iteration
21
/ 980)
loss: 2.260175
(Iteration
41
/
980)
loss:
2.074784
(Iteration
61
/
980)
loss:
1.851205

5
(Iteration 81 / 980) loss: 1.981100

(Iteration 101 / 980) loss: 2.231182

(Iteration 121 / 980) loss: 2.150490

(Iteration 141 / 980) loss: 1.765426

(Iteration 161 / 980) loss: 1.601561

(Iteration 181 / 980) loss: 1.833612

(Iteration 201 / 980) loss: 1.716498

(Iteration 221 / 980) loss: 1.858625

(Iteration 241 / 980) loss: 1.579105

(Iteration 261 / 980) loss: 1.891379

(Iteration 281 / 980) loss: 1.825264

(Iteration 301 / 980) loss: 1.653320

(Iteration 321 / 980) loss: 1.903400

(Iteration 341 / 980) loss: 1.928545

(Iteration 361 / 980) loss: 1.936250

(Iteration 381 / 980) loss: 1.647481

(Iteration 401 / 980) loss: 1.499575

(Iteration 421 / 980) loss: 1.960999

(Iteration 441 / 980) loss: 1.615085

(Iteration 461 / 980) loss: 1.704491

(Iteration 481 / 980) loss: 1.775115

(Iteration 501 / 980) loss: 1.673615

(Iteration 521 / 980) loss: 1.729676

(Iteration 541 / 980) loss: 1.696017

(Iteration 561 / 980) loss: 1.553330

(Iteration 581 / 980) loss: 1.957229

(Iteration 601 / 980) loss: 1.792699

(Iteration 621 / 980) loss: 1.516574

(Iteration 641 / 980) loss: 1.558088

(Iteration 661 / 980) loss: 1.501056

(Iteration 681 / 980) loss: 1.616942

(Iteration 701 / 980) loss: 1.497161

(Iteration 721 / 980) loss: 1.574669

(Iteration 741 / 980) loss: 1.651415

(Iteration 761 / 980) loss: 1.787088

(Iteration 781 / 980) loss: 1.396680

(Iteration 801 / 980) loss: 1.705457

(Iteration 821 / 980) loss: 1.466398

(Iteration 841 / 980) loss: 1.488631

(Iteration 861 / 980) loss: 1.763697

(Iteration 881 / 980) loss: 1.464066

(Iteration 901 / 980) loss: 1.420943

(Iteration 921 / 980) loss: 1.746861

(Iteration 941 / 980) loss: 1.592969

(Iteration 961 / 980) loss: 1.368916

(Epoch 1 / 1) train acc: 0.448000; val_acc: 0.434000

6
2 Get > 65% validation accuracy on CIFAR-10.

In the last part of the assignment, we’ll now ask you to train a CNN to get better than 65% validation accuracy on CIFAR-10.

2.0.1 Things you should try:

• Filter size: Above we used 7x7; but VGGNet and onwards showed stacks of 3x3 filters are good.

• Number of filters: Above we used 32 filters. Do more or fewer do better?

• Batch normalization: Try adding spatial batch normalization after convolution layers and vanilla batch normalization aafter aﬀine layers. Do your networks train faster?
• Network architecture: Can a deeper CNN do better? Consider these architectures:

– [conv-relu-pool]xN - conv - relu - [aﬀine]xM - [softmax or SVM]

– [conv-relu-pool]XN - [aﬀine]XM - [softmax or SVM]

– [conv-relu-conv-relu-pool]xN - [aﬀine]xM - [softmax or SVM]

2.0.2 Tips for training

For each network architecture that you try, you should tune the learning rate and regularization strength. When doing this there are a couple important things to keep in mind:

• If the parameters are working well, you should see improvement within a few hundred itera-tions

• Remember the coarse-to-fine approach for hyperparameter tuning: start by testing a large range of hyperparameters for just a few training iterations to find the combinations of pa-rameters that are working at all.
• Once you have found some sets of parameters that seem to work, search more finely around these parameters. You may need to train for more epochs.

• ]: # ================================================================ #

◦ YOUR CODE HERE:

◦ Implement a CNN to achieve greater than 65% validation accuracy

◦ on CIFAR-10.

◦ ================================================================ #

model = ThreeLayerConvNet(num_filters=64,

weight_scale=0.001,

hidden_dim=500,

reg=0.0015,

filter_size=3)

solver = Solver(model,

data,

num_epochs=8,

batch_size=500,

update_rule='adam',

optim_config={

7

'learning_rate': 1e-3,

},

lr_decay = 0.9,

verbose=True, print_every=20)

solver.train()

# ================================================================ #

• END YOUR CODE HERE

• ================================================================ #

(Iteration
1 / 784) loss: 2.308742
(Epoch 0 /
8)
train acc: 0.082000; val_acc: 0.098000
(Iteration
21
/ 784) loss: 1.772929
(Iteration
41
/ 784) loss: 1.627069
(Iteration
61
/ 784) loss: 1.565730
(Iteration
81
/ 784) loss: 1.427402
(Epoch 1 /
8)
train acc: 0.572000; val_acc: 0.562000
(Iteration
101
/ 784) loss: 1.385572
(Iteration
121
/ 784) loss: 1.354499
(Iteration
141
/ 784) loss: 1.224124
(Iteration
161
/ 784) loss: 1.362167
(Iteration
181
/ 784) loss: 1.180917
(Epoch 2 /
8)
train acc: 0.616000; val_acc: 0.594000
(Iteration
201
/ 784) loss: 1.159099
(Iteration
221
/ 784) loss: 1.222184
(Iteration
241
/ 784) loss: 1.171004
(Iteration
261
/ 784) loss: 1.041830
(Iteration
281
/ 784) loss: 1.170267
(Epoch 3 /
8)
train acc: 0.678000; val_acc: 0.606000
(Iteration
301
/ 784) loss: 1.006070
(Iteration
321
/ 784) loss: 1.064316
(Iteration
341
/ 784) loss: 1.011714
(Iteration
361
/ 784) loss: 1.005690
(Iteration
381
/ 784) loss: 0.957487
(Epoch 4 /
8)
train acc: 0.725000; val_acc: 0.635000
(Iteration
401
/ 784) loss: 0.873325
(Iteration
421
/ 784) loss: 0.914529
(Iteration
441
/ 784) loss: 0.888513
(Iteration
461
/ 784) loss: 0.900333
(Iteration
481
/ 784) loss: 0.927238
(Epoch 5 /
8)
train acc: 0.753000; val_acc: 0.667000

8
3 cnn.py

• ]: import numpy as np

from nndl.layers import *

from nndl.conv_layers import *

from cs231n.fast_layers import *

from nndl.layer_utils import *

from nndl.conv_layer_utils import *

import pdb

"""

This code was originally written for CS 231n at Stanford University

(cs231n.stanford.edu). It has been modified in various areas for use in the

ECE 239AS class at UCLA. This includes the descriptions of what code to

implement as well as some slight potential changes in variable names to be

consistent with class nomenclature. We thank Justin Johnson & Serena Yeung for

permission to use this code. To see the original version, please visit

cs231n.stanford.edu.

"""

class ThreeLayerConvNet(object):

"""

A three-layer convolutional network with the following architecture:

conv - relu - 2x2 max pool - affine - relu - affine - softmax

The network operates on minibatches of data that have shape (N, C, H, W)

consisting of N images, each with height H and width W and with C input

channels.

"""

def __init__(self, input_dim=(3, 32, 32), num_filters=32, filter_size=7, hidden_dim=100, num_classes=10, weight_scale=1e-3, reg=0.0, dtype=np.float32, use_batchnorm=False):

"""

Initialize a new network.

Inputs:

• input_dim: Tuple (C, H, W) giving size of input data

• num_filters: Number of filters to use in the convolutional layer

• filter_size: Size of filters to use in the convolutional layer

• hidden_dim: Number of units to use in the fully-connected hidden layer

• num_classes: Number of scores to produce from the final affine layer.

• weight_scale: Scalar giving standard deviation for random initialization of weights.

9

• reg: Scalar giving L2 regularization strength

• dtype: numpy datatype to use for computation.

"""

self.use_batchnorm = use_batchnorm self.params = {}
self.reg = reg self.dtype = dtype

# ================================================================ #

• YOUR CODE HERE:

• Initialize the weights and biases of a three layer CNN. To initialize:

• - the biases should be initialized to zeros.

• - the weights should be initialized to a matrix with entries

• drawn from a Gaussian distribution with zero mean and

• standard deviation given by weight_scale.

• ================================================================ #

C, H, W = input_dim

pool_height = (H - 2) // 2 + 1

pool_width = (W - 2) // 2 + 1

self.params['W1'] = np.random.normal(0, weight_scale, [num_filters, C,␣ ,→filter_size, filter_size])
self.params['b1'] = np.zeros(num_filters)

self.params['W2'] = np.random.normal(0, weight_scale, [pool_height *␣ ,→pool_width * num_filters, hidden_dim])
self.params['b2'] = np.zeros(hidden_dim)

self.params['W3'] = np.random.normal(0, weight_scale, [hidden_dim,␣ ,→num_classes])
self.params['b3'] = np.zeros(num_classes)

# ================================================================ #

• END YOUR CODE HERE

• ================================================================ #

for k, v in self.params.items():

self.params[k] = v.astype(dtype)

def loss(self, X, y=None):

"""

Evaluate loss and gradient for the three-layer convolutional network.

Input / output: Same API as TwoLayerNet in fc_net.py.

10

"""

W1, b1 = self.params['W1'], self.params['b1']

W2, b2 = self.params['W2'], self.params['b2']

W3, b3 = self.params['W3'], self.params['b3']

• pass conv_param to the forward pass for the convolutional layer filter_size = W1.shape[2]
conv_param = {'stride': 1, 'pad': (filter_size - 1) / 2}

• pass pool_param to the forward pass for the max-pooling layer pool_param = {'pool_height': 2, 'pool_width': 2, 'stride': 2}

scores = None

# ================================================================ #

• YOUR CODE HERE:

• Implement the forward pass of the three layer CNN. Store the output

• scores as the variable "scores".

• ================================================================ #

out, cache_conv_relu_pool = conv_relu_pool_forward(X, W1, b1, conv_param,␣ ,→pool_param)

out_shape = out.shape

out = out.reshape(out.shape[0], -1)

out, cache_affine_relu = affine_relu_forward(out, W2, b2)

scores, cache_affine = affine_forward(out, W3, b3)

# ================================================================ #

• END YOUR CODE HERE

• ================================================================ #

if y is None:

return scores

loss, grads = 0, {}

# ================================================================ #

• YOUR CODE HERE:

• Implement the backward pass of the three layer CNN. Store the grads

• in the grads dictionary, exactly as before (i.e., the gradient of

• self.params[k] will be grads[k]). Store the loss as "loss", and

• don't forget to add regularization on ALL weight matrices.

• ================================================================ #

loss_softmax, dout = softmax_loss(scores, y)

11

loss_regularization = self.reg * 0.5 * (np.sum(self.params['W1']**2) + np. ,→sum(self.params['W2']**2) + np.sum(self.params['W3']**2))
loss = loss_softmax + loss_regularization

dout, grads['W3'], grads['b3'] = affine_backward(dout, cache_affine)

dout, grads['W2'], grads['b2'] = affine_relu_backward(dout,␣

,→cache_affine_relu)

dout = dout.reshape(out_shape)
dout, grads['W1'], grads['b1'] = conv_relu_pool_backward(dout,␣

,→cache_conv_relu_pool)

grads['W3'] += self.reg * self.params['W3'] grads['W2'] += self.reg * self.params['W2'] grads['W1'] += self.reg * self.params['W1']

# ================================================================ #

• END YOUR CODE HERE

• ================================================================ #

return loss, grads

pass

12

More products

Project 1 Calculator Solution

$35

Buy now

Lab 10: Mini MIPS Solution

$30

Buy now

Lab 9: Building a Register File Solution

$30

Buy now