$24
1 Decision Trees [20 pts]
1.1 Decision Trees and Boolean Expressions [10 pts]
A boolean function can be written represented by a decision tree, i.e., the decision tree equivalent to that boolean function will output 1 if the expression is satisfied, 0 otherwise. Please write down a decision tree that is equivalent to the following:
(C^:A^:B^:E)_(C^A^D)_(:C^B)
1.2 Train a decision Tree [10 pts]
Below is a dataset that determines the survival rate of passengers on the Titanic based on three attributes— Class (C), Gender (G) and Age (A). Train a depth 1 decision tree (i.e., a decision tree with only 1 decision node) using information gain as the criterion for splitting, using IG(S; x) = H(S) H(Sjx), where x 2 fC; G; Ag. Which node is at the root of the tree? Show all your work (i.e., include all intermediate steps) in calculating information gain for each attribute. Round to 4 decimal points for all calculations, and use log base 2.
Class
Gender
Age
No
Yes
Total
1st
Male
Child
0
5
5
1st
Male
Adult
118
57
175
1st
Female
Child
0
1
1
1st
Female
Adult
4
140
144
Lower
Male
Child
35
24
59
Lower
Male
Adult
1211
281
1492
Lower
Female
Child
17
27
44
Lower
Female
Adult
105
176
281
Total:
1490
711
2201
Class
No
Yes
Total
Gender
No
Yes
Total
Age
No
Yes
Total
1st
122
203
325
Male
1364
367
1731
Child
52
57
109
Lower
1368
508
1876
Female
126
344
470
Adult
1438
654
2092
2 K-nearest neighbor [30 pts]
Implement k-nearest neighbours from scratch using any programming language of your choice. Download the dataset (knn-dataset.zip) posted on the course web page, which contains the training data and labels for each fold. Classify each input x according to the most frequent class amongst its k nearest neighbours as measured by the Euclidean distance (L2-norm). Break ties at random. Test the algorithms by 10-fold cross validation.
What to hand in:
Your code for K-nearest neighbor and cross validation
Find the best k by 10-fold cross validation. Draw a graph that show the accuracy as k increases from 1 to 30.
1
3 Perceptron [20 pts]
Consider a threshold perceptron that predicts y = 1 when wT x + w0 0 and y = 0 when wT x + w0 < 0. It is interesting to study the class of Boolean functions that can be represented by a threshold perceptron.
Assume that the input space is X = f0; 1g2 and the output space is Y = f0; 1g. For each of the following Boolean functions, indicate whether it is possible to encode the function as a threshold perceptron. If it is
possible, indicate some values
for
and
. If it is not possible, indicate a feature mapping
^
with
T
w
w0
:X!X
values for w and w0 such that w
(x) + w0 is a linear separator that encodes the function.
and
or
exclusive-or
iff
4 Empirical Evaluation [30 pts]
In this exercise, we will be working with the Breast Cancer Wisconsin (Diagnostic) Dataset (WDBC) from the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/datasets.html). The dataset consists of 569 samples of biopsied tissue. The tissue for each sample is imaged and 10 characteristics of the nuclei of cells present in each image are characterized, including
radius (mean of distances from center to points on the perimeter) texture (standard deviation of gray-scale values)
perimeter area
smoothness (local variation in radius lengths) compactness (perimeter2 / area - 1.0)
concavity (severity of concave portions of the contour)
concave points (number of concave portions of the contour) symmetry
fractal dimension (“coastline approximation” - 1)
Each of the 569 samples used in the dataset consists of a feature vector of length 30. The first 10 entries in this feature vector are the mean of the characteristics listed above for each image. The second 10 are the standard deviation and last 10 are the largest value of each of these characteristics present in each image. Each sample is also associated with a classification label, which is M for malignant and B for benign.
The dataset has already been divided into a training set (wdbc-train.csv) and test set (wdbc-test.csv), which you can download from the Assignments page on the course website.
Use your favourite machine learning package, and train (a) a decision tree, (b) kNN algorithm on the WDBC dataset, using all 30 features. Choose the best k by 10-fold cross validation. Make sure you shuffle the training data randomly before splitting it into groups to be used for cross validation.
Report the training performance using 10-fold cross validation, in terms of precision, recall, accuracy, sensitivity and specificity. Assume that M is the positive class.
Report the testing performance, in terms of precision, recall, accuracy, sensitivity and specificity. Assume that M is the positive class.
Suppose you were to recommend an algorithm to a conservative doctor who is afraid of making a mistake in diagnosis, which classifier would you recommend? Explain why?
2