HW05: Decision Tree Regression Solution

Starting from:

~~$30~~

$24

Home

In this homework, you will implement a decision tree regression algorithm in R, Matlab, or

Python. Here are the steps you need to follow:

You are given a univariate regression data set, which contains 133 data points, in the file named hw05_data_set.csv. Divide the data set into two parts by assigning the first 100 data points to the training set and the remaining 33 data points to the test set.

Implement a decision tree regression algorithm using the following pre-pruning rule: If a node has or fewer data points, convert this node into a terminal node and do not split further, where is a user-defined parameter.

Learn a decision tree by setting the pre-pruning parameter to 10. Draw training data points, test data points, and your fit in the same figure. Your figure should be similar to the following figure.

P=10

training

test

50

0

y

−50

−100

0 10 20 30 40 50 60

x

Calculate the root mean squared error for test data points. The formula for RMSE can be written as:
∑01231( + − -+)/

RMSE = ' +45

7897

Your output should be similar to the following sentence.

RMSE is 27.6841 when P is 10

Learn decision trees by setting the pre-pruning parameter to 1, 2, 3, …, 20. Draw RMSE for test data points as a function of . Your figure should be similar to the

following figure.

RMSE

33

32

31

30

29

28

27

26

5 10 15 20

P

What to submit: You need to submit your source code in a single file (.R file if you are using R, .m file if you are using Matlab, or .py file if you are using Python) and a short report explaining your approach (.doc, .docx, or .pdf file). You will put these two files in a single zip file named as STUDENTID.zip, where STUDENTID should be replaced with your 7-digit student number.

How to submit: E-mail the zip file you created to aghanem15@ku.edu.tr with the subject line Intro2MachineLearningHW05. Please follow the exact style mentioned for the subject line and do not send a zip file named as STUDENTID.zip. Submissions that do not follow these guidelines will not be graded.

Late submission policy: Late submissions will not be graded.

Cheating policy: Very similar submissions will not be graded.