Homework #3 Solution

Starting from:

~~$30~~

$24

Home

R Programming Submission Instructions

Make sure you clearly list each team member's names and Unity IDs at the top of your submission.

Your code should be named hw3:R. Add this le, along with a README to the zip le mentioned in the rst page.

Failure to follow naming conventions or programming related instructions speci ed below may result in your submission not being graded.

Carefully read what the function names have been requested by the instructor. In this homework or the following ones, if your code does not follow the naming format requested by the instructor, you will not receive credit.

For each function, both the input and output formats are provided in the hw3:R. Function calls are speci ed in hw3 checker:R. Please ensure that you follow the correct input and output formats. Once again, if you do not follow the format requested, you will not receive credit. It is clearly stated which functions need to be implemented by you in the comments in hw3.R

You are free to write your own functions to handle sub-tasks, but the TA will only call the functions he has requested. If the requested functions do not run/return the correct values/do not nish running in speci ed time, you will not receive full credit.

1
Homework 3

DO NOT set working directory (setwd function) or clear memory (rm(list=ls(all=T))) in your hw3:R code. TA(s) will do so in their own auto grader.

The TA will have an autograder which will rst run source(hw3.R), then call each of the functions requested in the homework and compare with the correct solution.

Your code should be clearly documented.

To test you code, step through the hw3 checker.R le. If you update you code, make sure to run source(`./hw3.R') again to update your function de nitions. You can also check the \Source on save" option in R Studio to do this automatically on save.

Please DO NOT include install.packages() or install github() in your hw3.R. Comment them out. Please note, we are specifying the allowed packages, which means that we already have them installed on our test machine. Having uncommented install.packages or install github in your code would result in a penalty of 5 points.

Problems

BN Inference (12 points) [Song Ju]. Compute the following probabilities according to the Bayesian net shown in Figure 1. ( means not)

Figure 1: BN Inference

Compute P (D; BjA). Show your work.

Compute P (C). Show your work.

Compute P (F ). Show your work.

Compute P (B; C; D; E; F ). Show your work.

SVM Theory (20 points) [Song Ju].

Support vector machines (SVM) learn a decision boundary leading to the largest margin between classes. In this question, you will train a SVM on a tiny dataset with 4 data points, shown in Figure 2. This dataset consists of two points with Class 1 (y = 1) and two points with Class 2 (y = -1). Each data point has two non-class attributes: X1 and X2.

Figure 2: SVM

Assume that w1 = w2. Find the weight vector w and bias b for the decision boundary of the SVM. What is the equation corresponding to this decision boundary?

Circle the support vectors and draw the decision boundary.

Given 2-dimensional data points Xi; i 2 [1; 2; 3; 4] as shown in Table 1, in this question, you will employ the kernel function for SVM to classify these four data points.

Data ID

x1
x2

y

X1

0
1

-1
X2

0
-1

-1
X3

1
0

1
X4

-1
0

1

Table 1: Four Data Points

Suppose the kernel function is: K(X; Z) = (1 + 2 X Z)2, where X and Z indicate two data points. This kernel is equal to an inner product (X) (Z) with a certain function, . Calculate the function .

Transform the four given data points Xi; i 2 [1; 2; 3; 4] to the higher dimensional space using the function that you derived in part (i). Report (Xi) for i 2 [1; 2; 3; 4].
Assume that the four transformed data points that you got from part (ii) are all support vec-tors. Apply Lagrange multipliers to determine the maximum margin linear decision boundary in the transformed higher dimensional space. Note: this will involve solving a system of equations.

Linear Regression (15 points) [Ruth Okoilu].

Given the following three training data points of the form (x, y): (2; 5), (0; 2), (3; 3), estimate the parameters for linear regression of the form y = w1x2 + w0.

Note that x is squared in the formula.

Determine the values of w1 and w0 and show each step of your work.

Calculate the training RMSE for the tted linear regression.

Programming (33 points) [Krishna Gadiraju] In this question, you will be performing a variety of machine learning operations - regression and classi cation.

PART-1: Regression (15 points)

Dataset description: You are provided a dataset with 20 variables. Variables x1 x19 refer to the independent variables, while variable y is your dependent variable. Training data is stored in the le data/regression-train.csv, and test data is stored in the le data/regression-test.csv.

Note: The TA will use a di erent version of data/regression-test.csv. The format (inde-

pendent variables x1 x19, dependent variable y) will be similar, but TA's le may contain di erent number of data points than the le supplied to you. Please ensure you take this into account, and do not hard code any dimensions.

In this exercise, you will apply three di erent types of regression methods to the dataset supplied to you, and then compare their results:

Learning: You will write code in the function alda regression() to train simple linear regression, ridge regression and lasso regression models. Detailed instructions for imple-mentation and allowed packages have been provided in hw3.R. Note that for the lasso and ridge regression models, you will be using crossvalidation to tune the hyperparameter.

Comparison: You will write code in the function regression compare rmse() to com-pare the three regression models from above. Detailed instructions for implementation and allowed packages have been provided in hw3.R

PART-2: Classi cation (18 Points)

Dataset description: You are provided a dataset with 5 variables. Variables x1 x4 refer to the independent variables, while variable class is your class variable. Training data is stored in the le data/classification-train.csv, and test data is stored in the le data/classification-test.csv.

Note: The TA will use a di erent version of data/classification-test.csv. The format

(independent variables x1 x4, dependent variable class) will be similar, but TA's le may contain di erent number of data points than the le supplied to you. Please ensure you take this into account, and do not hard code any dimensions.

In this exercise, you will apply two di erent types of classi cation methods to the dataset supplied to you, and then compare their results:

Support Vector Machine: In this exercise, you will use cross validation to tune hy-

perparameters for four di erent types of kernels : linear, radial basis, polynomial and sigmoid kernels. You will write code in the function alda svm(). Detailed instructions for implementation and allowed packages have been provided in hw3.R

Comparison: You will write code in classification compare accuracy() to compare all 4 SVM kernels. Detailed instructions for implementation and allowed packages have been provided in hw3.R.

NOTE: Your entire solution hw3.R should not take more than 3 minutes to run. Any solution taking longer will be awarded a zero.

4