$24
In this exercise, you will implement logistic regression and apply it to two different datasets. Before starting on the programming exercise, we strongly recommend watching the video lectures and completing the review questions for the associated topics.
To get started with the exercise, you will need to download the starter code and unzip its contents to the directory where you wish to complete the exercise. If needed, use the cd command in Octave/MATLAB to change to this directory before starting this exercise.
You can also find instructions for installing Octave/MATLAB in the “En- vironment Setup Instructions” of the course website.
Files included in this exercise
ex2.m - Octave/MATLAB script that steps you through the exercise ex2 reg.m - Octave/MATLAB script for the later parts of the exercise ex2data1.txt - Training set for the first half of the exercise ex2data2.txt - Training set for the second half of the exercise submit.m - Submission script that sends your solutions to our servers mapFeature.m - Function to generate polynomial features
plotDecisionBoundary.m - Function to plot classifier’s decision bound- ary
[?] plotData.m - Function to plot 2D classification data
[?] sigmoid.m - Sigmoid Function
[?] costFunction.m - Logistic Regression Cost Function
[?] predict.m - Logistic Regression Prediction Function
[?] costFunctionReg.m - Regularized Logistic Regression Cost
? indicates files you will need to complete
Throughout the exercise, you will be using the scripts ex2.m and ex2 reg.m. These scripts set up the dataset for the problems and make calls to functions that you will write. You do not need to modify either of them. You are only required to modify functions in other files, by following the instructions in this assignment.
Where to get help
The exercises in this course use Octave1 or MATLAB, a high-level program- ming language well-suited for numerical computations. If you do not have Octave or MATLAB installed, please refer to the installation instructions in the “Environment Setup Instructions” of the course website.
At the Octave/MATLAB command line, typing help followed by a func- tion name displays documentation for a built-in function. For example, help plot will bring up help information for plotting. Further documentation for Octave functions can be found at the Octave documentation pages. MAT- LAB documentation can be found at the MATLAB documentation pages.
We also strongly encourage using the online Discussions to discuss ex- ercises with other students. However, do not look at any source code written by others or share your source code with others.
1 Logistic Regression
In this part of the exercise, you will build a logistic regression model to predict whether a student gets admitted into a university.
Suppose that you are the administrator of a university department and you want to determine each applicant’s chance of admission based on their results on two exams. You have historical data from previous applicants that you can use as a training set for logistic regression. For each training example, you have the applicant’s scores on two exams and the admissions decision.
Your task is to build a classification model that estimates an applicant’s probability of admission based the scores from those two exams. This outline and the framework code in ex2.m will guide you through the exercise.
1 Octave is a free alternative to MATLAB. For the programming exercises, you are free to use either Octave or MATLAB
1.1 Visualizing the data
Before starting to implement any learning algorithm, it is always good to visualize the data if possible. In the first part of ex2.m, the code will load the data and display it on a 2-dimensional plot by calling the function plotData.
You will now complete the code in plotData so that it displays a figure
like Figure 1, where the axes are the two exam scores, and the positive and negative examples are shown with different markers.
Exam 1 score
Figure 1: Scatter plot of training data
To help you get more familiar with plotting, we have left plotData.m empty so you can try to implement it yourself. However, this is an optional (ungraded) exercise. We also provide our implementation below so you can copy it or refer to it. If you choose to copy our example, make sure you learn what each of its commands is doing by consulting the Octave/MATLAB documentation.
% Find Indices of Positive and Negative Examples pos = find(y==1); neg = find(y == 0);
% Plot Examples
plot(X(pos, 1), X(pos, 2), 'k+','LineWidth', 2, ...
'MarkerSize', 7);
plot(X(neg, 1), X(neg, 2), 'ko', 'MarkerFaceColor', 'y', ...
'MarkerSize', 7);
1.2 Implementation
1.2.1 Warmup exercise: sigmoid function
Before you start with the actual cost function, recall that the logistic regres- sion hypothesis is defined as:
hθ (x) = g(θT x),
where function g is the sigmoid function. The sigmoid function is defined as:
1
g(z) =
1 + e−z .
Your first step is to implement this function in sigmoid.m so it can be called by the rest of your program. When you are finished, try testing a few values by calling sigmoid(x) at the Octave/MATLAB command line. For large positive values of x, the sigmoid should be close to 1, while for large negative values, the sigmoid should be close to 0. Evaluating sigmoid(0) should give you exactly 0.5. Your code should also work with vectors and matrices. For a matrix, your function should perform the sigmoid function on every element.
You can submit your solution for grading by typing submit at the Oc- tave/MATLAB command line. The submission script will prompt you for your login e-mail and submission token and ask you which files you want to submit. You can obtain a submission token from the web page for the assignment.