Starting from:
$35

$29

Homework 3 (for analytical part for programming part) Solution

Instructions




Please use a word processing software (e.g., Microsoft word) to write your answers and submit a printed copy to me at the beginning of the class on Feb 8. The rationale is that it is sometimes hard to read and understand the hand-written answers.




All homeworks should be done individually.







Analytical Part (50 points)




Q1. (5 points) Answer the following with a yes or no along with proper justi cation.




Is the decision boundary of voted perceptron linear?



Is the decision boundary of averaged perceptron linear?



Q2. (5 points) Consider the following setting. You are provided with n training examples: (x1; y1; h1); (x2; y2; h2); ; (xn; yn; hn), where xi is the input example, yi is the class label (+1 or -1), and hi 0 is the importance weight of the example. The teacher gave you some additional information by specifying the importance of each training example. How will you modify the perceptron algorithm to be able to leverage this extra information? Please justify your answer.




Q3. (5 points) Consider the following setting. You are provided with n training examples: (x1; y1); (x2; y2); ; (xn; yn), where xi is the input example, and yi is the class label (+1 or -1). However, the training data is highly imbalanced (say 90% of the examples are negative and 10% of the examples are positive) and we care more about the accuracy of positive examples. How will you modify the perceptron algorithm to solve this learning problem? Please justify your answer.




Q4. (10 points)




You were just hired by MetaMind. MetaMind is expanding rapidly, and you decide to use your machine learning skills to assist them in their attempts to hire the best. To do so, you have the following available to you for each candidate i in the pool of candidates I: (i) Their GPA, (ii) Whether they took Data Mining course and achieved an A, (iii) Whether they took Algorithms course and achieved an A, (iv) Whether they have a job o er from Google, (v) Whether they have a job o er from Facebook, (vi) The number of misspelled words on their resume. You decide to represent each candidate i 2 I by a corresponding 6-dimensional




feature vector f(x(i)). You believe that if you just knew the right weight vector w 2 <6 you could reliably predict the quality of a candidate i by computing w f(x(i)). To determine w your boss lets you sample pairs of candidates from the pool. For a pair of candidates (k; l) you can have them face o in a \DataMining- ght." The result is score (k l), which tells you that candidate k is at least score (k l) better than candidate l. Note that the score will be negative when l is a better candidate than k. Assume you collected scores for a set of pairs of candidates P.




Describe how you could use a perceptron based algorithm to learn the weight vector w. Make sure to describe the basic intuition; how the weight updates will be done; and pseudo-code for the entire algorithm.




Q5. (15 points) Suppose we have n+ positive training examples and n negative training examples. Let C+ be the center of the positive examples and C be the center of the negative

examples, i.e., C+ =


1
Pi: yi=+1 xi
and C
=
1
Pi: yi= 1 xi. Consider a simple classi er
n+
n
called CLOSE that classi es a test example x by assigning it to the class whose center is closest.




Show that the decision boundary of the CLOSE classi er is a linear hyperplane of the form sign(w x + b). Compute the values of w and b in terms of C+ and C .




Recall that the weight vector can be written as a linear combination of all the training

examples: w =


n++n
i yi xi. Compute the dual weights ( ’s). How many of the


i=1
training
examples are support vectors?


P




Q6. (10 points) Please read the following paper and write a brief summary of the main points in at most TWO pages.




Pedro M. Domingos: A few useful things to know about machine learning. Communications




of ACM 55(10): 78-87 (2012)

https://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf










Programming and Experimental Part (50 points)




(35 points) Fortune Cookie Classi er1




You will build a binary fortune cookie classi er. This classi er will be used to classify fortune cookie messages into two classes: messages that predict what will happen in the future (class 1) and messages that just contain a wise saying (class 0). For example,




\Never go in against a Sicilian when death is on the line" would be a message in class 0.




\You will get an A in Machine learning class" would be a message in class 1.







Thanks to Weng-Keen Wong and his advisor Andrew Moore for sharing the data.






2



Files Provided There are three sets of les. All words in these les are lower case and punctuation has been removed.




1) The training data:




traindata.txt: This is the training data consisting of fortune cookie messages.




trainlabels.txt: This le contains the class labels for the training data.




2) The testing data:




testdata.txt: This is the testing data consisting of fortune cookie messages.




testlabels.txt: This le contains the class labels for the testing data.




3) A list of stopwords: stoplist.txt




There are two steps: the pre-processing step and the classi cation step. In the pre-processing step, you will convert fortune cookie messages into features to be used by your classi er. You will be using a bag of words representation. The following steps outline the process involved:




Form the vocabulary. The vocabulary consists of the set of all the words that are in the training data with stop words removed (stop words are common, uninformative words such as \a" and \the" that are listed in the le stoplist.txt). The vocabulary will now be the features of your training data. Keep the vocabulary in alphabetical order to help you with debugging.




Now, convert the training data into a set of features. Let M be the size of your vocabulary. For each fortune cookie message, you will convert it into a feature vector of size M. Each slot in that feature vector takes the value of 0 or 1. For these M slots, if the ith slot is 1, it means that the ith word in the vocabulary is present in the fortune cookie message; otherwise, if it is 0, then the ith word is not present in the message. Most of these feature vector slots will be 0. Since you are keeping the vocabulary in alphabetical order, the rst feature will be the rst word alphabetically in the vocabulary.




Implement a binary classi er with perceptron weight update as shown below. Use learning rate =1.







Algorithm 1 Online Binary-Classi er Learning Algorithm







Input: D = Training examples, T = maximum number of training iterations

Output: w, the nal weight vector




Initialize the weights w = 0



2: for each training iteration itr 2 f1; 2; ; T g do

for each training example (xt; yt) 2 D do
y^t = sign(w xt) // predict using the current weights



if mistake then
w = w + yt xt // update the weights



end if



end for



end for



return nal weight vector w









3



Compute the the number of mistakes made during each iteration (1 to 20).



Compute the training accuracy and testing accuracy after each iteration (1 to 20).






Compute the training accuracy and testing accuracy after 20 iterations with stan-dard perceptron and averaged perceptron.






Implement a multi-class online learning algorithm with perceptron weight update as shown below. Use learning rate =1.







Algorithm 2 Online Multi-Class Classi er Learning Algorithm







Input: D = Training examples, k = number of classes, T = maximum number of training iterations

Output: w1; w2; ; wk the nal weight vectors for k classes

Initialize the weights w1 = 0; w2 = 0; ; wk = 0



2: for each training iteration itr 2 f1; 2; ; T g do

for each training example (xt; yt) 2 D do



y^t = arg maxy2f1;2; ;kg wy xt // predict using the current weights
if mistake then
wyt = wyt + xt // update the weights
7: wy^t = wy^t xt // update the weights

end if



end for



end for
return nal weight vectors w1; w2; ; wk









Task. You will build a optical character recognition (OCR) classi er: given an image of handwritten character, we need to predict the corresponding letter. You are provided with a training and testing set.




Data format. Each non-empty line corresponds to one input-output pair. 128 binary values after \im" correspond to the input features (pixel values of a binary image). The letter immediately afterwards is the corresponding output label.




Compute the the number of mistakes made during each iteration (1 to 20).



Compute the training accuracy and testing accuracy after each iteration (1 to 20).






Compute the training accuracy and testing accuracy after 20 iterations with stan-dard perceptron and averaged perceptron.






Instructions for Code Submission and Output Format.




Please follow the below instructions. It will help us in grading your programming part of the homework. We will provide a dropbox folder link for code submission.




4
Supported programming languages: Python, Java, C++




Store all the relevant les in a folder and submit the corresponding zip le named after your student-id, e.g., 114513209.zip




This folder should have a script le named




run_code.sh




Executing this script should do all the necessary steps required for executing the code including compiling, linking, and execution




Assume relative le paths in your code. Some examples: ‘‘./filename.txt’’ or ‘‘../hw1/filename.txt’’




The output of your program should be dumped in a le named \output.txt" in the following format. One block for binary classi er and another similar block for the multi-class classi er.




iteration-1 no-of-mistakes










iteration-20 no-of-mistakes




iteration-1 training-accuracy testing-accuracy







iteration-20 training-accuracy testing-accuracy




training-accuracy-standard-perceptron testing-accuracy-averaged-perceptron




Explanation. Self-explanatory




Make sure the output.txt le is dumped when you execute the script run_code.sh




Zip the entire folder and submit it as




<student_id.zip

5
Grading Rubric




Each question in the students work will be assigned a letter grade of either A,B,C,D, or F by the Instructor and TAs. This ve-point (discrete) scale is described as follows:




A) Exemplary (=100%).




Solution presented solves the problem stated correctly and meets all requirements of the problem.




Solution is clearly presented.




Assumptions made are reasonable and are explicitly stated in the solution.




Solution represents an elegant and e ective way to solve the problem and is not overly complicated than is necessary.







B) Capable (=75%).




Solution is mostly correct, satisfying most of the above criteria under the exemplary category, but contains some minor pitfalls, errors/ aws or limitations.







C) Needs Improvement (=50%).




Solution demonstrates a viable approach toward solving the problem but contains some major pitfalls, errors/ aws or limitations.







D) Unsatisfactory (=25%)




Critical elements of the solution are missing or signi cantly awed.




Solution does not demonstrate su cient understanding of the problem and/or any rea-sonable directions to solve the problem.







F) Not attempted (=0%) No solution provided.







The points on a given homework question will be equal to the percentage assigned (given by the letter grades shown above) multiplied by the maximum number of possible points worth for that question. For example, if a question is worth 6 points and the answer is awarded a B grade, then that implies 4.5 points out of 6.








































6

More products