Your task is to prepare the data, run an analysis, answer research questions below and write a report with your findings.
You have to submit two files: R-code to run your program and your report in PDF or MS Word format.
To obtain the maximum available marks you should aim to:
1. Code all requested components.
2. Use a clear style of code presentation. Code clarity is an important part of your submission. Thus you should choose meaningful variable names and adopt the use of comments — you don't need to comment every single line, as this will affect readability — however you should aim to comment at least each section of code.
3. Have the code run successfully.
4. Output the information in a presentable manner and present your written analysis of the output.
Plagiarism is a specific form of academic misconduct. Although the University encourages discussing work with others and the Social Forum will support this, ultimately this submission is to represent your individual work. If plagiarism is found, all parties will be penalised. You should retain copies of all assignment computer files used during development. These files must remain unchanged after submission, for the purpose of checking if required.
For the purpose of this exam, a “paragraph” is considered to consist of approximately 6 – 8 lines. You are welcome to exceed this amount.
This exam appears longer than it actually is – explanations are given to help you understand the requested analyses.
You do not need to write specialised code as you did for the assignments/exercises. You should be able to find nearly all the code you need from the R files provided throughout the course, via case studies and other examples. If you copy/paste code from the R code provided, this should give you nearly 100% of the code needed for this exam, with a few alterations on your behalf (e.g. filenames, variable names etc).
Dataset
The data for this question are the responses to the sensometric qualities of chocolate that can be purchased in supermarkets. Two groups were asked to rate the qualities of the chocolates. The first group contained a panel of sensometric experts. The second group contained a panel of volunteers chosen to represent ‘regular shoppers’ who underwent a three-hour sensometric training session before rating the qualities of the chocolate.
The responses were recorded over a scale from 0 to 10 with 0 indicating the absence of the sensometric quality and 10 indicating fully present. There are 14 sensometric variables (Chocolate Aroma through to Granular Texture in the data file) and variable Role indicating if responses were provided by experts or amateurs.
Your task
It is of interest to determine if experts perceive supermarket chocolate differently to non-experts (the amateurs). You have to run an analysis and prepare a report using two types of analysis: EFA and clustering (weighted equally in terms of your grade). Specifically, your analysis should include:
Initial data discussion: Write an explanation of the data and any data manipulation performed prior to analysis should you do so.
Then for each method separately Research methods introduction: Write a short explanation (approximately 1 paragraph) of the analysis to be performed.
Exploratory Factor Analysis: conduct two separate exploratory factor analyses: the first for the expert responses, the other for the amateur responses. You may present the analyses side-by-side or in sequence; whatever you believe is best. For each Exploratory Factor Analysis, you only need to include the following:
• If appropriate, Cronbach Alpha output and a short discussion (2 – 3 lines) of whether the data is trustworthy and why.
• Correlation output of your choosing (graphical and/or numerical) with an accompanying discussion (3 – 4 lines). If numerical, round the correlations to 2 digits;
• A single paragraph explaining the outcome of the determinant test, Bartlett’s test of sphericity and the KMO statistic for both data sets. Do not include R output.
• Your decision regarding the number of factors to estimate (scree plot may be shown, do not show the R console output).
• The FINAL factor solution. You do not need to discuss results of any of the other solutions, however you should justify your final factor solution, including loadings, and name the factors in each analysis. You should also include up to two sentences indicating whether the test of residuals was passed and whether the factors are correlated.
• All factors should be named and an explanation as to how you come up with these names should be included.
• Based on the factor analysis results and your chosen factor names, discuss the factors that have emerged from the study. What types of differences or similarities (if any) exist between the expert and amateur sensometric ratings?
• Conclusions: write 2 paragraphs of conclusions based on your analysis.
Clustering Analysis: For this question, you are asked to conduct clustering analysis using both hierarchical and partitional clustering techniques for the entire data set combining experts and amateurs. Variable Role should be not used for clustering — you use only 14 sensometric variables. Specifically, your analysis should include:
• Hierarchical clustering: conduct hierarchical clustering on the data, choosing an appropriate AGNES-based method based on either single, complete, average-linkage or Ward’s method. Ensure you justify your choice in your write-up and include the resulting dendrogram, as well as a discussion of the outcomes of hierarchical clustering on your data.
• Partitional clustering: conduct a partitional clustering of your data using K-means. Ensure you explain and include any relevant R output (including graphics) supporting your choice of k, the number of clusters.
• Validation: as a form of cluster validation, consider the following:
o If there are obvious outliers or distances that should be removed, identify these in your write-up and re-run your chosen Partitional Clustering algorithm, adjusting k if necessary. Include justification of your choice of the new value for k.
o If there are no obvious outliers/distances that should be removed, then explain this conclusion with justification. In this case re-run your chosen Partitional Clustering algorithm for a different value of k to that used above. Include justification of your choice for the new value for k.
• Select one best solution (from any method), analyse values of 14 sensometric variables for each cluster, describe observed patterns, name your clusters. Compare clustering membership to variable Role (use function table to get a cross-tabulation table). Are there any patterns?
• Conclusions: write 2 paragraphs of conclusions based on your analysis including a statement regarding which clustering solution is the better one and why.
Submission
You must submit a formal report with your research findings in MS Word or PDF format and R-code.