$24
• Overview
In this assignment, you are going to use Weka 3.8 1 to do some tasks on a dataset. The aim is to make you familiar with certain machine learning algorithms and Weka. Weka is a tool that has collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset through Weka desktop application or they can be called from your own Java code.
• Tasks
You will use the German Credit Risk dataset that is provided for you as an attachment in Odtuclass. Each entry in the dataset represents a person who takes a credit by a bank. Each person is classified as ”good” or ”bad” credit risks according to the set of attributes. Therefore, you are expected to classify instances according to their credit risks. You can find more information about the dataset in the following link. However, please download the attached dataset as it is slightly different from the one in the link 2.
2.1 Preprocessing (20 Points)
In this section, you are expected to perform data transformations such as handling the missing values in the dataset. The mini-tasks are explained in the ”ceng414 hw1.ipynb” file. You must write your solutions in ”ceng414 hw1.ipynb”. At the end of this phase, you must save the final dataset as ”credit wo na.csv”.
1https://waikato.github.io/weka-wiki/downloading_weka/
2https://archive.ics.uci.edu/ml/datasets/statlog+(german+credit+data)
1
2.2 Multi-Layer Perceptron (35 Points)
For this task, you will use the ”credit wo na.csv” dataset on Weka. Under the Classify tab in explorer window, choose MultilayerPerceptron classifier. Report your results under 5-fold cross validation. You will run the classifier with the default parameters and note them. Answer the following questions according to the run:
1. How many hidden layers and hidden nodes created?
2. Did Weka normalize the attributes? What is the effect of normalizing the attributes?
3. Which halting strategy did MLP use?
2.3 Decision Tree (25 Points)
Open the explorer in Weka GUI and open the ”credit wo na.csv”. Go to Classify tab and choose J48 classifier under trees. Report your results under 5-fold cross validation. Execute the classifier without changing default parameters. Besides, you should express the pruned tree, Summary and Detailed Accuracy By Class. In addition, put the visualization of the tree in your report.
2.4 Naive Bayes (20 Points)
Naive Bayes is a simple yet powerful machine learning algorithm used for classification tasks. It is based on Bayes’ theorem and assumes that the features are conditionally indepen-dent given the class label. Open the explorer in Weka GUI and open the ”credit wo na.csv”. Go to Classify tab and choose NaiveBayes classifier from the list of available classifiers.
• Submission
You are expected to submit a zip file which includes the following two documents:
• ”ceng414 hw1.ipynb” file: You are expected to perform data transformations given in this file and submit your own implementation.
• Report: You are expected to assess the performance of these classifiers in your report including accuracy, precision, recall, and F1-measure. You can access these metrics from the ”Result list” panel in the ”Classify” tab. Besides, you need to answer the additional questions in Section 2. Your report must not exceed 3 pages. You must submit your report in pdf format.
• Tutorials
◦ Pandas
◦ Jupyter Notebook
◦ Weka
2
• Regulations
◦ Submission will be done via ODTUClass. You are expected to submit a zip file con-taining your code and report presenting the analysis result.
◦ Late submission is not allowed.
◦ We have zero tolerance policy for cheating. People involved in cheating will be punished according to the university regulations.
3