Assignment 5: Decision Trees

Starting from:

~~$30~~

$24

Home

1 Inductive Construction

Given the dataset in Table 1 construct a decision tree by hand using the top-down algorithm presented in the lecture. Your stop criteria are zero entropy or a depth of 2 (the root node is at depth 0, the rst layer of inner nodes are at depth 1, ...).

Draw your nal decision tree and provide your computation steps (information gain per relevant attribute for each split, class frequencies for each node, . . . ). Compute the error rate of your decision tree on the training data.

2 Minimal Error Pruning

Figure 1 shows a decision tree for classi ying mushrooms as either edible or poisonous. We used the same dataset as shown in the lecture and constructed it with scikit-learn with a maximum depth of 4. Because the implementation of scikit-learn uses a slightly di erent algorithm (CART, in the lecture: ID3), which allows continuous data, we have to apply one-hot encoding to each variable resulting in a binary decision tree.

Prune two decisions (inner nodes) of the decision tree based on the error rate. First, compute the overall error rate of the original tree. Second, consider the removal of each viable node by computing the error rate after removing it from the tree. Third, identify the node with the lowest error rate and prune it. Repeat this process to remove a second inner node.

Draw the decision tree after each pruning step. Provide your calculations for the pruning step.

Note: A node is viable for pruning, if all its children are leaf nodes. After pruning the chosen inner node turns into a leaf node. We de ne the overall error rate as (slide 44):

number of misclassi ed samples in each leaf

total number of samples

1

Table 1: Dataset consists of 4
categorical
features
(F1 2 f0; 1g, F2 2 f0; 1g, F3 2
f0; 1; 2g, F4 2 f0; 1g) and a binary classi cation target with labels f ; +g.

F1
F2
F3
F4

Instances

+

0
0
0
0

0

10

0
0
0
1

5
5

0
0
1
0

10
0

0
0
1
1

10
0

0
0
2
0

0
10

0
0
2
1

0
10

0
1
0
0

10
0

0
1
0
1

10
0

0
1
1
0

5
5

0
1
1
1

0
10

0
1
2
0

10
0

0
1
2
1

0
10

1
0
0
0

0
10

1
0
0
1

10
0

1
0
1
0

5
5

1
0
1
1

0
10

1
0
2
0

0
10

1
0
2
1

0
10

1
1
0
0

10
0

1
1
0
1

10
0

1
1
1
0

10
0

1
1
1
1

10
0

1
1
2
0

0
10

1
1
2
1

0
10

2

odor = none

edible = 2948

poisonous = 2738

no

bruises = false

edible = 577

poisonous = 2663

yes

cap-color = yellow

edible = 2371

poisonous = 75

no

odor = foul

edible = 577

poisonous = 386

no

yes

odor = pungent

edible = 0

edible = 577

poisonous = 210

poisonous = 176

no

yes

yes

no

edible = 0

cap-shape = bell
poisonous = 2277

edible = 2371

poisonous = 60

no

cap-color = pink

edible = 2268

poisonous = 33

no yes

yes

edible = 0

poisonous = 15

yes

bruises = false

edible = 103

poisonous = 27

no yes

edible = 577

poisonous = 0

edible = 0

poisonous = 176

edible = 2222

poisonous = 24

edible = 46

poisonous = 9

edible = 0

poisonous = 25

edible = 103

poisonous = 2

Figure 1: Decision tree of depth 4 classi es mushrooms as edible or poisonous. Inner nodes have a decision rule at the top, the left child is chosen if the rule does not apply and the right child is chosen if the rule applies. For each leaf and inner node, the class frequencies for the training data are summarized.

3 Regression with Decision Trees and kNN

For all students other than B.Sc. Data Science.

Decision trees and k-nearest-neighbors are not limited to classi cation but can also be used for regression. Research and explain the following two questions in su cient detail:

1. How does the construction of regression trees di er to classi cation trees? How is a prediction computed in a regression tree?

2. How can kNN be used for regression?

Please cite your sources.

3