$24
Question 1 (For Assessment)
Load the data set question1.RData into R.
Compute the following 2 class clusterings of the data:
A hierarchical clustering using single linkage
A hierarchical clustering using complete linkage
A 2 cluster k-means clustering (with nstart=30)
For each clustering, make a plot of the data coloured according to which cluster it is in.
Write a short paragraph commenting on the different clusterings. It should explain why the clusterings are different and which clustering is preferable.
Instructions for submission: Submit a PDF containing the three pictures produced in Step 3 and the explanation in step 4.
Question 2
Implement Lloyd’s K-Means algorithm. The skeleton should look like this:
my_kmeans <- function(data, k, n_starts) {
done = FALSE
n = dim[data][1] #data is a matrix, where each row is one data point
cluster = rep(NA,n) #this vector says which cluster each point is in
#uniformly choose initial cluster centers
centers = data[sample(x=1:n,size = k, replace = FALSE),]
while (!done) {
Do Step 2.1
Do Step 2.2
Check if the cluster assignements changed. If they have, set done=TRUE
}
return(cluster)
}
1
Use this algorithm to make a 4 clustering of the data set in question2.RData. Comment on the clustering.
Question 3 (Bookwork)
Do Question 2 from section 10.7 of the text book
Question 4 (Bookwork)
Do Question 4 from section 10.7 of the text book
2