$29
For this programming assignment you will design a perceptron for a classification task and report your results in a 1-3 page report. In addition to the model you will also need to generate your own data to test your implementation.
PART 1 (50 points)
Problem
You are tasked with building a neural network that discriminates shapes. Specifically, you want it to differentiate between these two exact shapes:
The classifier should be able to recognize the shape regardless of he orientation (facing up, down, left or right). Inside of your submission you must provide the code that generates this data. The code that generates this data must be able to produce the data on different image sizes (the shape size remains the same). In other words your data generator must:
Create an empty image of size N-by-N where N is specified by the user. I suggest using a N-by-N matrix where 0 represents white and 1 represents black.
Generate input images by inserting the provided two shapes at different locations and orientations on the empty image (make use of flip and transpose operations for the orientation). Make sure to keep track of the labels as you generate the data. The labels are just binary values 1 or 0 which will be assigned to a shape.
Separate your data randomly into three parts: training and testing sets. Your training set should be be roughly 70% of the data you generated (make sure to use a random permutation operation on your data before separating). You are allowed to decide how you want to implement the data separation step as long as you can keep track of which set does each data point belongs to.
Your data generating function should return the generated data along with the labels. Ideally you want each generated sample to be unique. For small enough images your function should simply generate all possible samples (each shape at each possible position at each possible orientation). No samples in the test set should be present in the training set.
Building a Perceptron
After generating the data you will need to build a neural network. Specifically a perceptron similar to the one in the supervised learning slides. Your perceptron should:
Reshape the matrices into 1 dimensional vectors (transform N-by-N images into a N-by-1 vectors)
Initialize the network given your input size and set of parameters (learning rate). Your weights should be initialized to a random number between 0 and 1 (don't forget about the bias). The output of your network will then be only one value. If you are using a soft activation function just treat anything above 0.5 as true and below 0.5 as false.
Use the perceptron learning rule to try to learn the set of weights that will discriminate between the two sets.
After each epoch (a pass through all your training data) report the error for the training and test set (how many samples does the model classify incorrectly). You accumulate the number of wrong classifications in the training set as you run the perceptron training rule.
Your algorithm should stop at convergence (training error reaches 0). It is also a good idea to include a maximum number of epoch in case the data is nor linearly separable.
It is up to you if you want to use the threshold activation function or the sigmoid activation function
Report
Along with the code you will need to write a short report that includes the following:
Very brief description of your design decisions. What activation function did you use? Is there anything special about how the data is generated?
What happens as you change the size of the image? Does the algorithm take more epochs to learn? Less? Try out some values of N from 6 up to 15 ( you don't need to try every value some will suffice)
What happens as you decrease/decrease the learning rate? Start by setting your learning rate to 0.5 and change it gradually. What is the optimal learning rate?
Your report should explain the reasoning behind your answers. Provide plots that compare the error rate between testing and training data sets throughout epochs to justify your results. Think of a line plot where you have a decreasing line for training and testing, the y axis is the error rate and the x axis is the epoch.
EXTRA POINTS: PART 2 (20 points)
Repeat the previous experiments with two modifications which should be included in the report:
Introduce the concept of noise into your input data. If noise is introduced set it so your data generating function has some small probability to make a white space black (make the probability small like 0.1 or so). Introduce noise before adding the image in. How does the perceptron perform as you add more and more noise?
Modify your implementation of the perceptron into a multilayer perceptron. Use the instructions in the backpropagation lecture for this. You will need to use sigmoid function and apply minimum square error. You should provide graphs showing how much the loss function decreased at every epoch for both test and train set. Note that now at every epoch you will have an error term which is the number of incorrect classifications and a loss term which is the average mean square error (sum them at each sample and divide by number of samples). How does the performance of a multilayer perceptron compares to a single perceptron? Compare different number of hidden units and performance in the scenarios that you tested your perceptron before.
Try to be concise in your report. It is preferable that you provide your results organized in tables and plots that summaries your experiments. You submission should contain the following elements:
Code that generates the data and trains/tests a perceptron model to discriminate it.
A readme file that states how to run your program
A project report showing the results of your experiments and your analysis
Be sure to start your assignment early so you have time to ask questions about it.