Starting from:
$35

$29

Assignment #2 Solution


Note 1: Your submission header must have the format as shown in the above-enclosed rounded rectangle.

Note 2: Homework is to be done individually. You may discuss the homework problems with your fellow students, but you are NOT allowed to copy – either in part or in whole – anyone else’s answers.

Note 3: Your deliverable should be a .pdf file submitted through Gradescope until the deadline. Do not forget to assign a page to each of your answers when making a submission. In addition, source code (.py files) should be added to an online repository (e.g., github) to be downloaded and executed later.

Note 4: All submitted materials must be legible. Figures/diagrams must have good quality.

Note 5: Please use and check the Canvas discussion for further instructions, questions, answers, and hints.

    1. [16 points] Considering that ID3 built the decision tree below after analyzing a given training set, answer the following questions:


Tear

Normal    Reduced

Spectacle

No


Myope


Hypermetrope


























Yes









Astigmatism
























No


Yes












Yes






No











    a) [12 points] What is the accuracy of this model if applied to the test set below? You must inform the values for True Positives, True Negatives, False Positives, and False Negatives for full credit.

Age
Spectacle
Astigmatism
Tear
Lenses (ground truth)





Young
Hypermetrope
Yes
Normal
Yes
Young
Hypermetrope
No
Normal
Yes
Young
Myope
No
Reduced
No
Presbyopic
Hypermetrope
No
Reduced
No
Presbyopic
Myope
No
Normal
No
Presbyopic
Myope
Yes
Reduced
No
Prepresbyopic
Myope
Yes
Normal
Yes
Prepresbyopic
Myope
No
Reduced
No

    b) [4 points] What is the precision, recall, and F1-measure of this model when applied to the same test set?

    2. [15 points] Complete the Python program (decision_tree.py) that will read the files contact_lens_training_1.csv, contact_lens_training_2.csv, and contact_lens_training_3.csv. Each of those training sets has a different number of instances. You will observe that now the trees are being created setting the parameter max_depth = 3, which it is used to define the maximum depth of the tree (pre-pruning strategy) in sklearn. Your goal is to train, test, and output the performance of the models created by using each training set on the test set provided (contact_lens_test.csv). You must repeat this process 10 times (train and test by using a different training set), choosing the lowest accuracy as the final classification performance of each model.

    3. [32 points] Consider the dataset below to answer the following questions:


y











x

    a. [4 points] What is the leave-one-out cross-validation error rate (LOO-CV) for 1NN? Use Euclidean distance as your distance measure and the error rate calculated as:
=


        b. [4 points] What is the leave-one-out cross-validation error rate (LOO-CV) for 3NN?

        c. [4 points] What is the leave-one-out cross-validation error rate (LOO-CV) for 9NN?

        d. [5 points] Draw de decision boundary learned by the 1NN algorithm.

        e. [15 points] Complete the Python program (knn.py) that will read the file binary_points.csv and output the LOO-CV error rate for 1NN (same answer of part a).

    4. [12 points] Find the class of instance #10 below following the 3NN strategy. Use Euclidean distance as your distance measure. You must show all your calculations for full credit.

ID
Red
Green
Blue
Class
#1
220
20
60
1
#2
255
99
21
1
#3
250
128
14
1
#4
144
238
144
2
#5
107
142
35
2
#6
46
139
87
2
#7
64
224
208
3
#8
176
224
23
3
#9
100
149
237
3
#10
154
205
50
?
5. [25 points]  Use the dataset below to answer the next questions:























    a) [10 points] Classify the instance ‹D15, Sunny, Mild, Normal, Weak› following the Naïve Bayes strategy. Show all your calculations until the final normalized probability values.

    b) [15 points] Complete the Python program (naïve_bayes.py) that will read the file weather_training.csv (training set) and output the classification of each test instance from the file weather_test (test set) if the classification confidence is >= 0.75. Sample of output:

Day
Outlook Temperature
Humidity
Wind
PlayTennis
Confidence
D15
Sunny
Hot
High
Weak
No
0.86
D16
Sunny
Mild
High
Weak
Yes
0.78





Important Note: Answers to all questions should be written clearly, concisely, and unmistakably delineated. You may resubmit multiple times until the deadline (the last submission will be considered).

NO LATE ASSIGNMENTS WILL BE ACCEPTED. ALWAYS SUBMIT WHATEVER YOU HAVE COMPLETED FOR PARTIAL CREDIT BEFORE THE DEADLINE!

More products