Foundations of Machine Learning Assignment 1

Starting from:

~~$35~~

$29

Home

• Markings will be based on the correctness and soundness of the outputs.

• Marks will be deducted in case of plagiarism.

• Proper indentation and appropriate comments (if necessary) are mandatory.

• Use of frameworks like scikit-learn etc is allowed.

• All benchmarks(accuracy etc), answers to questions and supporting examples should be added in a separate file with the name ‘report’.
• All code needs to be submitted in ‘.py’ format. Even if you code it in ‘.ipynb’ format, download it in ‘.py’ format and then submit
• You should zip all the required files and name the zip file as:

◦ <roll_no>_assignment_<#>.zip, eg. 1501cs11_assignment_01.zip.

• Upload your assignment ( the zip file ) in the following link:

◦ https://www.dropbox.com/request/GBzzFlhrK9ZDPbtbL4S7

Problem Statement:

• The assignment targets to implement K-Means and K-Medoid algorithms to cluster the dataset consists of socio-economic and health factors of countries and determine the overall development of the country

Implementation:

• Implement K-Means and K-Medoid algorithms to cluster the given dataset as follows:

◦ Perform standard data cleaning operations such as data cleaning (handling missing values) and data scaling (handling the outliers)
◦ Perform 5-fold cross validation

◦ Classify the countries according to the following categories:

▪ Developed Country

▪ Developing Country

▪ Under-Developing Country

Dataset:

• Link to dataset: https://www.kaggle.com/datasets/rohan0301/unsupervised-learning-on-country-d ata
Documents to submit:

• Model code

• Accuracy, Precision, Recall and F1 Scores of each fold

• Visualization of clusters after the model is converged