Additional Links on the bottom for data sets and assistance
Competency
Explain statistical techniques used in data science.
Scenario
Sprockets Corporation designs high-end, specialty machine parts for a variety of industries. You have been hired by Sprockets to assist them with their data analysis needs. Sprockets Corporation needs a deeper investigation into their sales data. In reviewing the distributions, management is curious how the data is related to each other, if at all. If the data is related, they want to know what types of insights can be obtained from them. In support of their business model, they would like to find the relationship of other variables to the sales, and perhaps more importantly, determine if the relationships are by chance or have statistical significance.
Instructions
John Sprocket, CEO of Sprockets Corporation, has requested a data analysis from you to be presented to the leadership team at Sprockets Corporation. You will also share an executive summary including all source code, results and supplemental information necessary for the leadership team.
Leverage the “R” programming language for the following analysis.
- Using the ‘cor’ function, generate individual correlations between SALES and the following parameters: QUANTITYORDERED, PRICEEACH and QTR_ID.
- Perform a multiple regression using the “R” statistical package, modeling SALES together with QUANTITYORDERED, PRICEEACH and QTR_ID.
- Generate the summary description and provide some explanation of your results.
- Generate a second multiple regression including CITY, COUNTRY, DEALSIZE and CUSTOMER along with the original three variables. Assign Boolean numbers to these values so they can be applied to a linear regression – this will have to be substantially limited to some assumptions as to having a particular value or not in order to establish it as a flag. For example, one might represent the CITY feature as IsBoston which would be set to “1” if it is Boston and “0” if not, for COUNTRY, isCountryUSA could be set to “1” for in US and “0” for not, DEALSIZE could be represented by isLargedealsize could be set to “1” if it is above the median and “0” if not and CUSTOMER could be isCustomerLandOfToys set to “1” if that is the customer and “0” if not. Please note that only one assumption per variable is necessary for your analysis and you do not have to consider all combinations for your analysis.
Links
https://learning.rasmussen.edu/bbcswebdav/pid-5855337-dt-content-rid-151629590_1/xid-151629590_1
http://rasmussen.libanswers.com/faq/32381
http://rasmussen.libanswers.com/faq/32410
http://proquest.safaribooksonline.com.ezproxy.rasmussen.edu/book/programming/r/9781430245544/chapter-4-summary-statistics/sec12_html?uicode=rasmussen
http://proquest.safaribooksonline.com.ezproxy.rasmussen.edu/book/programming/r/9781788627306/regression-analysis-with-r/cover_xhtml?uicode=rasmussen