Task 2.1) Conduct and report on exploratory data analysis (EDA) of housing.csv data set using
RapidMiner Studio data mining tool and RapidMiner Studio operators
Provide following for Task 2.1:
(i) a screen capture of final EDA process, briefly describe EDA process
(ii) summarise key results of exploratory data analysis in Table 2.1 Results of Exploratory
Data Analysis for housing.csv. Table 2.1 should include key characteristics of each
variable in housing.csv set such as maximum, minimum values, average, standard
deviation, most frequent values (mode), missing values and invalid values etc.
(iii) Discuss key results of exploratory data analysis presented in Table 2.1 and provide a
rationale for selecting top 5 variables for predicting sale price of a property (Price), in
particular focusing on the relationships of independent variables with each other and with
dependent variable Price drawing on results of EDA analysis and relevant literature on
what determinates property prices
(20 marks 250 words)
Hint: Statistics Tab and Chart Tab in RapidMiner Studio provide a lot of descriptive statistical
information and the ability to create useful charts like Barcharts, Scatterplots, Boxplot charts etc
for EDA analysis. You might also like to look at running correlations and/or chi square tests as
appropriate to determine which variables contribute most to predicting property sale price (Price).
Task 2.2) Build and report on Linear Regression model for predicting property sale price (Price)
using RapidMiner data mining process and appropriate set of data mining operators and a reduced set
of variables from housing.csv data set as determined by your exploratory data analysis in Task 2.1.
Provide the following for Task 2.2:
(i) A screen capture of Final Linear Regression Model process and briefly describe your Final
Linear Regression Model process
(ii) Table 2.2 named Results of Final Linear Regression Model for Task 2.2 for housing.csv
data set.
(iii) Discuss the results of Final Linear Regression Model for housing.csv data set drawing on
key outputs (coefficients, standardised coefficients, t-statistics values, p-values and
significance levels etc) for predicting property sale price (Price) and relevant supporting
literature on interpretation of a Linear Regression Model.
(16 marks 150 words)
Include all appropriate outputs such as RapidMiner Processes, Graphs and Tables that support key
aspects of exploratory data analysis and linear regression model analysis of the housing.csv data
set in your Assignment 2 report.
#Sales Offer!| Get upto 25% Off: