$29
• Markings will be based on the correctness and soundness of the outputs.
• Marks will be deducted in case of plagiarism.
• Proper indentation and appropriate comments (if necessary) are mandatory.
• Use of frameworks like scikit-learn etc is allowed.
• All benchmarks(accuracy etc), answers to questions and supporting examples should be added in a separate file with the name ‘report’.
• All code needs to be submitted in ‘.py’ format. Even if you code it in ‘.ipynb’ format, download it in ‘.py’ format and then submit
• You should zip all the required files and name the zip file as:
◦ <roll_no>_assignment_<#>.zip, eg. 1501cs11_assignment_01.zip.
• Upload your assignment ( the zip file ) in the following link:
◦ https://www.dropbox.com/request/GBzzFlhrK9ZDPbtbL4S7
Problem Statement:
• The assignment targets to implement K-Means and K-Medoid algorithms to cluster the dataset consists of socio-economic and health factors of countries and determine the overall development of the country
Implementation:
• Implement K-Means and K-Medoid algorithms to cluster the given dataset as follows:
◦ Perform standard data cleaning operations such as data cleaning (handling missing values) and data scaling (handling the outliers)
◦ Perform 5-fold cross validation
◦ Classify the countries according to the following categories:
▪ Developed Country
▪ Developing Country
▪ Under-Developing Country
Dataset:
• Link to dataset: https://www.kaggle.com/datasets/rohan0301/unsupervised-learning-on-country-d ata
Documents to submit:
• Model code
• Accuracy, Precision, Recall and F1 Scores of each fold
• Visualization of clusters after the model is converged