Starting from:
$30

$24

Week 2 Tutorial: Clustering Solution




Question 1 (For Assessment)




Load the data set question1.RData into R.



Compute the following 2 class clusterings of the data:



A hierarchical clustering using single linkage



A hierarchical clustering using complete linkage



A 2 cluster k-means clustering (with nstart=30)



For each clustering, make a plot of the data coloured according to which cluster it is in.



Write a short paragraph commenting on the different clusterings. It should explain why the clusterings are different and which clustering is preferable.



Instructions for submission: Submit a PDF containing the three pictures produced in Step 3 and the explanation in step 4.




Question 2




Implement Lloyd’s K-Means algorithm. The skeleton should look like this:




my_kmeans <- function(data, k, n_starts) {







done = FALSE




n = dim[data][1] #data is a matrix, where each row is one data point




cluster = rep(NA,n) #this vector says which cluster each point is in




#uniformly choose initial cluster centers




centers = data[sample(x=1:n,size = k, replace = FALSE),]




while (!done) {




Do Step 2.1






Do Step 2.2



Check if the cluster assignements changed. If they have, set done=TRUE



}




return(cluster)




}







1
Use this algorithm to make a 4 clustering of the data set in question2.RData. Comment on the clustering.




Question 3 (Bookwork)




Do Question 2 from section 10.7 of the text book




Question 4 (Bookwork)




Do Question 4 from section 10.7 of the text book
































































































































































2

More products