$29
In this assignment, you will implement the ID3 algorithm for learning decision trees. You may assume that the class label and all attributes are binary (only 2 values). Please use the provided skeleton code in Python to implement the algorithm.
The ID3 algorithm is similar to what we discussed in class: Start with an empty tree and build it recursively. Use information gain to select the attribute to split on. (Do not divide by split information.) Use a threshold on the infor-mation gain to determine when to stop. The full algorithm is described in this paper: http://dept.cs.williams.edu/ andrea/cs374/Articles/Quinlan.pdf
You may look at open-source reference implementations, such as WEKA, but please do not copy code from open-source projects. Your code must be your own. Undergraduates may complete the assignment in a team of 2. Graduates must complete the assignment alone.
The starter code is provided at http://nlp.uoregon.edu/download/cis472/hw2-starter-code.zip. You should write your code in python 3. The code should run from the command line on ix.cs.uoregon.edu and accept the following arguments:
python3 id3.py <train> <test> <model>
where train/test are the paths to les containing training data and testing data; model is a path to a le where you will save the model for the decision tree.
The data les are in CSV format. The rst line lists the names of the attributes. The last attribute is the class label.
We are providing skeleton code in Python that handles input, out-put, and some of the internal data structures. Please use it as the starting point because we’d be using that API to grade.
For saving model les, please use the following format:
wesley = 0 :
| honor = 0 :
1
According to this tree, if wesley = 0 and honor = 0 and barclay = 0, then the class value of the corresponding instance should be 1. In other words, the value appearing before a colon is an attribute value, and the value appearing after a colon is a class value.
Once you are done with coding, you may want to check your code with this autograder: https://ix.cs.uoregon.edu/ vietl/cis472/hw2/upload.html. You should submit a single le named id3.py. The testing may take upto 500 seconds per submission.
You should submit a single le id3.py through canvas.