$24
In this homework, you will implement a decision tree regression algorithm in R, Matlab, or
Python. Here are the steps you need to follow:
You are given a univariate regression data set, which contains 133 data points, in the file named hw05_data_set.csv. Divide the data set into two parts by assigning the first 100 data points to the training set and the remaining 33 data points to the test set.
Implement a decision tree regression algorithm using the following pre-pruning rule: If a node has or fewer data points, convert this node into a terminal node and do not split further, where is a user-defined parameter.
Learn a decision tree by setting the pre-pruning parameter to 10. Draw training data points, test data points, and your fit in the same figure. Your figure should be similar to the following figure.
P=10
training
test
50
0
y
−50
−100
0 10 20 30 40 50 60
x
Calculate the root mean squared error for test data points. The formula for RMSE can be written as:
∑01231( + − -+)/
RMSE = ' +45
7897
Your output should be similar to the following sentence.
RMSE is 27.6841 when P is 10
Learn decision trees by setting the pre-pruning parameter to 1, 2, 3, …, 20. Draw RMSE for test data points as a function of . Your figure should be similar to the
following figure.
RMSE
33
32
31
30
29
28
27
26
5 10 15 20
P
What to submit: You need to submit your source code in a single file (.R file if you are using R, .m file if you are using Matlab, or .py file if you are using Python) and a short report explaining your approach (.doc, .docx, or .pdf file). You will put these two files in a single zip file named as STUDENTID.zip, where STUDENTID should be replaced with your 7-digit student number.
How to submit: E-mail the zip file you created to aghanem15@ku.edu.tr with the subject line Intro2MachineLearningHW05. Please follow the exact style mentioned for the subject line and do not send a zip file named as STUDENTID.zip. Submissions that do not follow these guidelines will not be graded.
Late submission policy: Late submissions will not be graded.
Cheating policy: Very similar submissions will not be graded.