Order instructions
- Describe the aims of the following data mining tasks: exploratory data analysis, predictive modelling, descriptive modelling, and discovering rules. Give some examples. Do not copy from the course handouts but explain using your own words! (Q1: 20 marks)
The second part of the assignment is based on the churn data set which was obtained from a Telco provider. The aim of churn prediction is to distinguish churners from non-churners using a set of customer characteristics. - The first thing is to describe the information about the customers themselves. Use various graphs such as pie charts, stacked bar charts and scatter plots to describe some interesting characteristics about the data. As a hint, try to also take into account the churn/non-churn status as a grouping variable in your plot. Finally pick out two interesting variables and use descriptive statistics to explain what, if anything is interesting about them. Use correlation or any other techniques to describe the relationship between some of the characteristics (Q2: 20 marks).
- The Telco firm wants to understand what characteristics of the customer affect whether the he/she will be a churner or not. Use logistic regression to identify what are the important factors, and obtain a churn score. Draw the ROC curve and calculate the Gini coefficient for this churn score and explain the meaning of your results (Q3: 20 marks).
The third part of the assignment focusses on clustering. - Explain the following concepts and illustrate with examples (Q4: 20 marks):
k-means clustering
dendrogram
scree diagram
Principal Component Analysis
The