1. Implement the k-means algorithm for clustering.
2. Suppose that you represent your corpus as a graph in which each document is a node, and the weight of the edge between a pair of nodes is equal to the cosine similarity between them. Interpret the single-linkage clustering algorithm in terms of this similarity graph.
3. Suppose you were given only the similarity graph of Exercise 5 and not the actual documents. How would you perform k-means clustering with this input?
4. For the case of hierarchical clustering algorithms, what is the complexity of centroid merging? How would you make it efficient?