Homework 1 Solution

Starting from:

~~$30~~

$24

Home

Homework 1 Solution

You are allowed to discuss the homework problems with your classmates, but you are supposed to do your assignment individually.

You cannot use an existing machine learning / neural networking / etc. library.

You need to turn in the computer codes also.

(30 pts) Design a two-layer neural network with the signum activation function (i.e. sgn(x) = 1 if x 0, sgn(x) = 1 if x < 0, and sgn(0) = 0) such that the network implements the logic gate
f(x1; x2; x3) = x1x2x3 + x1x2. Assume that the input of 1 is used to represent a FALSE, and an input of 1 is used to represent a TRUE. Show your work and draw the final network. Note that in class, we have discussed examples where we have instead used the step activation function and a 0 for FALSE.

2. (30 pts) Consider the network in Fig. 1. In the x y plane, sketch the region where z = 1. Show your work. Make sure you correctly indicate which part of the boundaries belong to the region z = 1. Recall that u(x) = 1 if x 0 and u(x) = 0 if x < 0.

1

1
+
u(.)
1

x
-1
1

-1.5

-1
+
u(.)
1

z

-1
+
u(.)

y
-1

-1

+
u(.)

Figure 1: The neural network for Problem 2.

(40 pts) Write a computer program that runs the perceptron training algorithm with the step activa-tion function u( ). You have to include the source files of your computer program together with the solution of the problem. Implement the following steps.

1
Do not be scared at the fact that the question has a million steps. Most of the steps are very simple and they are just there to make your life easier.
(b) Pick (your code should pick it) w0 uniformly at random on [
1
;
1
].
4

4

Pick w1 uniformly at random on [ 1; 1].

Pick w2 uniformly at random on [ 1; 1].

Write in your report the numbers [w0; w1; w2] you have picked.

Pick n = 100 vectors x1; : : : ; xn independently and uniformly at random on [ 1; 1]2, call the collection of these vectors S.
Let S1 S denote the collection of all x = [x1 x2] 2 S satisfying [1 x1 x2][w0 w1; w2]T 0.

Let S0 S denote the collection of all x = [x1 x2] 2 S satisfying [1 x1 x2][w0 w1; w2]T < 0.

In one plot, show the line w0 + w1x1 + w2x2 = 0, with x1 being the “x-axis” and x2 being the “y-axis.” In the same plot, show all the points in S1 and all the points in S0. Use different symbols for S0 and S1. Indicate which points belong to which class. An example figure may be as shown in Fig. 2 (My labels look bad, I expect you to do a better job!).

Figure 2: An example figure for Problem 3i.

Use the perceptron training algorithm to find the weights that can separate the two classes S0 and S1 (Obviously you already know such weights, they are w0; w1 and w2 above, but we will find the weights from scratch, and the training sets S0 and S1). In detail,
i. Use the training parameter = 1.

ii. Pick w0′ ; w1′ ; w2′ independently and uniformly at random on [ 1; 1]. Write them in your report.

Record the number of misclassifications if we use the weights [w0′ ; w1′ ; w2′ ].
After one epoch of the perceptron training algorithm, you will find a new set of weights
[w0′′ ; w1′′ ; w2′′ ].

Record the number of misclassifications if we use the weights [w0′′ ; w1′′ ; w2′′ ].

Do another epoch of the perceptron training algorithm, find a new set of weights, record the number of misclassifications, and so on, until convergence.

2
Write down the final weights you obtain in your report. How does these weights compare to the “optimal” weights [w0; w1; w2]?

Regarding the previous step, draw a graph that shows the epoch number vs the number of mis-classifications.
Repeat the same experiment with = 10. Do not change w0; w1; w2; S; w0′ ; w1′ ; w2′ . As in the case = 1, draw a graph that shows the epoch number vs the number of misclassifications.

Repeat the same experiment with = 0:1. Do not change w0; w1; w2; S; w0′ ; w1′ ; w2′ . As in the case = 1, draw a graph that shows the epoch number vs the number of misclassifications.

Comment on how the changes in effect the number of epochs needed until convergence.

Comment on whether we would get the exact same results (in terms of the effects of on training performance) if we had started with different w0; w1; w2; S; w0′ ; w1′ ; w2′ .

Do the same experiments with n = 1000 samples. Comment on the differences compared to n = 100.

3