Use the K-means clustering method to divide Pima Indian women into three groups based on their age and BMI. Provide the centroid for cluster. Create a scatter plot, where each cluster is identified by a different symbol. How these three groups are different from each other based on their age and BMI?

1. In Sect. 3.6, we used the GBSG (German Breast Cancer Study Group) data set from the mfp package to create a new variable called rfs (recurrence-free survival) such that rfs=“No” if the patient had at least one recurrence or died (i.e., cenc=1) and rfs=“Yes” otherwise. Use the K-means clustering method to divide the patients into two groups based on their age, the size of tumor (tumsize), and the number of positive nodes (posnodal). Make sure the options Print cluster summary and Assign clusters to the data set are checked. Explain how the two groups are different using cluster specific summaries. R-Commander creates a new variable, which is called KMeans by default, to identify the two groups. Use this variable along with rfs to create a 2×2 contingency table where the rows show different clusters and the columns shows different values rfs. What are the sample proportion of recurrence-free survivals for the two groups. Compare the odds of recurrence-free survival between the two groups.

