Week 2 Tutorial: Clustering Solution

Starting from:

~~$30~~

$24

Home

Question 1 (For Assessment)

Load the data set question1.RData into R.

Compute the following 2 class clusterings of the data:

A hierarchical clustering using single linkage

A hierarchical clustering using complete linkage

A 2 cluster k-means clustering (with nstart=30)

For each clustering, make a plot of the data coloured according to which cluster it is in.

Write a short paragraph commenting on the diﬀerent clusterings. It should explain why the clusterings are diﬀerent and which clustering is preferable.

Instructions for submission: Submit a PDF containing the three pictures produced in Step 3 and the explanation in step 4.

Question 2

Implement Lloyd’s K-Means algorithm. The skeleton should look like this:

my_kmeans <- function(data, k, n_starts) {

done = FALSE

n = dim[data][1] #data is a matrix, where each row is one data point

cluster = rep(NA,n) #this vector says which cluster each point is in

#uniformly choose initial cluster centers

centers = data[sample(x=1:n,size = k, replace = FALSE),]

while (!done) {

Do Step 2.1

Do Step 2.2

Check if the cluster assignements changed. If they have, set done=TRUE

}

return(cluster)

}

1
Use this algorithm to make a 4 clustering of the data set in question2.RData. Comment on the clustering.

Question 3 (Bookwork)

Do Question 2 from section 10.7 of the text book

Question 4 (Bookwork)

Do Question 4 from section 10.7 of the text book

2