$24
1 Inductive Construction
Given the dataset in Table 1 construct a decision tree by hand using the top-down algorithm presented in the lecture. Your stop criteria are zero entropy or a depth of 2 (the root node is at depth 0, the rst layer of inner nodes are at depth 1, ...).
Draw your nal decision tree and provide your computation steps (information gain per relevant attribute for each split, class frequencies for each node, . . . ). Compute the error rate of your decision tree on the training data.
2 Minimal Error Pruning
Figure 1 shows a decision tree for classi ying mushrooms as either edible or poisonous. We used the same dataset as shown in the lecture and constructed it with scikit-learn with a maximum depth of 4. Because the implementation of scikit-learn uses a slightly di erent algorithm (CART, in the lecture: ID3), which allows continuous data, we have to apply one-hot encoding to each variable resulting in a binary decision tree.
Prune two decisions (inner nodes) of the decision tree based on the error rate. First, compute the overall error rate of the original tree. Second, consider the removal of each viable node by computing the error rate after removing it from the tree. Third, identify the node with the lowest error rate and prune it. Repeat this process to remove a second inner node.
Draw the decision tree after each pruning step. Provide your calculations for the pruning step.
Note: A node is viable for pruning, if all its children are leaf nodes. After pruning the chosen inner node turns into a leaf node. We de ne the overall error rate as (slide 44):
number of misclassi ed samples in each leaf
total number of samples
1
Table 1: Dataset consists of 4
categorical
features
(F1 2 f0; 1g, F2 2 f0; 1g, F3 2
f0; 1; 2g, F4 2 f0; 1g) and a binary classi cation target with labels f ; +g.
F1
F2
F3
F4
Instances
+
0
0
0
0
0
10
0
0
0
1
5
5
0
0
1
0
10
0
0
0
1
1
10
0
0
0
2
0
0
10
0
0
2
1
0
10
0
1
0
0
10
0
0
1
0
1
10
0
0
1
1
0
5
5
0
1
1
1
0
10
0
1
2
0
10
0
0
1
2
1
0
10
1
0
0
0
0
10
1
0
0
1
10
0
1
0
1
0
5
5
1
0
1
1
0
10
1
0
2
0
0
10
1
0
2
1
0
10
1
1
0
0
10
0
1
1
0
1
10
0
1
1
1
0
10
0
1
1
1
1
10
0
1
1
2
0
0
10
1
1
2
1
0
10
2
odor = none
edible = 2948
poisonous = 2738
no
bruises = false
edible = 577
poisonous = 2663
yes
cap-color = yellow
edible = 2371
poisonous = 75
no
odor = foul
edible = 577
poisonous = 386
no
yes
odor = pungent
edible = 0
edible = 577
poisonous = 210
poisonous = 176
no
yes
yes
no
edible = 0
cap-shape = bell
poisonous = 2277
edible = 2371
poisonous = 60
no
cap-color = pink
edible = 2268
poisonous = 33
no yes
yes
edible = 0
poisonous = 15
yes
bruises = false
edible = 103
poisonous = 27
no yes
edible = 577
poisonous = 0
edible = 0
poisonous = 176
edible = 2222
poisonous = 24
edible = 46
poisonous = 9
edible = 0
poisonous = 25
edible = 103
poisonous = 2
Figure 1: Decision tree of depth 4 classi es mushrooms as edible or poisonous. Inner nodes have a decision rule at the top, the left child is chosen if the rule does not apply and the right child is chosen if the rule applies. For each leaf and inner node, the class frequencies for the training data are summarized.
3 Regression with Decision Trees and kNN
For all students other than B.Sc. Data Science.
Decision trees and k-nearest-neighbors are not limited to classi cation but can also be used for regression. Research and explain the following two questions in su cient detail:
1. How does the construction of regression trees di er to classi cation trees? How is a prediction computed in a regression tree?
2. How can kNN be used for regression?
Please cite your sources.
3