i. Write a code in R or Python that will calculate the correlation between “Column2” and “Column3” of a “dataframe” [1 Mark] ii. The above dataset has been loaded for you in R or Python in a variable named “dataframe”. Write a code that will select only the rows for which parameter is Alpha? [1 Mark] iii. A majority of work in Python or R uses systems internal memory and with large datasets, situations may arise when the Python or R workspace cannot hold all the data in memory. So, removing the unused objects is one of the solutions. Write a command that will remove rows with values called “Beta” [1 Mark] iv. State and explain Techniques and tools (R or Python packages) that are used to preprocess data so that it can be ready for data mining [5 Marks] (b) Suppose that your local bank has a data mining system. The bank has been studying your debit card usage patterns. Noticing that you make many transactions at home renovation stores, the bank decides to contact you, offering information regarding their special loans for home improvements. i. Briefly explain how this may conflict with your right to privacy. [2 Marks] ii. Describe a privacy-preserving data mining method that may allow the bank to perform customer pattern analysis without infringing on its customers’ right to privacy. [2 Marks] (c) Data quality can be assessed in terms of several issues, including accuracy, completeness, and consistency. For each of the above three issues; i. Briefly discuss how data quality assessment can depend on the intended use of the data, giving examples. [2 Marks] ii. Propose TWO other dimensions of data quality [2 Marks] (d) In real-world data, tuples with missing values for some attributes are a common occurrence. Describe any TWO methods for handling this problem. [2 Marks] (e) Briefly describe any TWO issues to consider during data integration. Give example for each case. [2 Marks] (f) What are the differences between the three main types of data warehouse usage, namely; i. Information processing [1 Mark] ii. Analytical processing [1 Mark] iii. Data mining [1 Mark]
#Sales Offer!| Get upto 25% Off: