$24
In this exercise, you will be using support vector machines (SVMs) to build a spam classifier. Before starting on the programming exercise, we strongly recommend watching the video lectures and completing the review questions for the associated topics.
To get started with the exercise, you will need to download the starter code and unzip its contents to the directory where you wish to complete the exercise. If needed, use the cd command in Octave/MATLAB to change to this directory before starting this exercise.
You can also find instructions for installing Octave/MATLAB in the “En- vironment Setup Instructions” of the course website.
Files included in this exercise
ex6.m - Octave/MATLAB script for the first half of the exercise
ex6data1.mat - Example Dataset 1 ex6data2.mat - Example Dataset 2 ex6data3.mat - Example Dataset 3 svmTrain.m - SVM training function svmPredict.m - SVM prediction function plotData.m - Plot 2D data
visualizeBoundaryLinear.m - Plot linear boundary visualizeBoundary.m - Plot non-linear boundary linearKernel.m - Linear kernel for SVM
[?] gaussianKernel.m - Gaussian kernel for SVM
[?] dataset3Params.m - Parameters to use for Dataset 3
ex6 spam.m - Octave/MATLAB script for the second half of the exer- cise
spamTrain.mat - Spam training set spamTest.mat - Spam test set emailSample1.txt - Sample email 1 emailSample2.txt - Sample email 2 spamSample1.txt - Sample spam 1 spamSample2.txt - Sample spam 2 vocab.txt - Vocabulary list getVocabList.m - Load vocabulary list porterStemmer.m - Stemming function
readFile.m - Reads a file into a character string
submit.m - Submission script that sends your solutions to our servers
[?] processEmail.m - Email preprocessing
[?] emailFeatures.m - Feature extraction from emails
? indicates files you will need to complete
Throughout the exercise, you will be using the script ex6.m. These scripts set up the dataset for the problems and make calls to functions that you will write. You are only required to modify functions in other files, by following the instructions in this assignment.
Submission and Grading
After completing various parts of the assignment, be sure to use the submit function system to submit your solutions to our servers. The following is a breakdown of how each part of this exercise is scored.
Part
Submitted File
Points
Gaussian Kernel
Parameters (C , σ) for Dataset 3
gaussianKernel.m
dataset3Params.m
25 points
25 points
Email Preprocessing
Email Feature Extraction
processEmail.m
emailFeatures.m
25 points
25 points
Total Points
100 points
You are allowed to submit your solutions multiple times, and we will take only the highest score into consideration.