Data are available for 201 countries on five demographic and economic characteristics: the rate of population growth (popgrwth), GDP per capita in US dollars (gdpcap), births per 1000 persons/year (birthrate), proportion of workforce employed in agriculture (agriculture) and the rate of literacy (literacy). A Principal Component Analysis is to be done out to look for interesting structure in this data.
a) Why is it a good idea to first scale the data? (1 Mark)
b) The Figure1 below is a scatter plot of the first two principal components of the scaled data.
Figure 1
i). Do you think there is correlation between the first and second principal components? Justify your answer. (3 Marks)
ii). Describe the interesting features of the Figure1. (4 Marks)
Describe in a sentence or two the main differences between supervised and unsupervised learning.