7. Organizations are often challenged with what to protect and to what extent. Many companies are still having a difficult time in protecting their data in digital format

and defining what constitutes sensitive and non-confidential data, and how to share the data within the company and with the public.

Describe technology’s impact on data security and ethics in data analytics. Use APA-style references wherever necessary to support your discussion.

You must make at least two substantive responses to your classmates’ posts. Respond to these posts in any of the following ways:

· Build on something your classmate said.

· Explain why and how you see things differently.

· Ask a probing or clarifying question.

· Share an insight from having read your classmates’ postings.

· Offer and support an opinion.

· Validate an idea with your own experience.

· Expand on your classmates’ postings.

· Ask for evidence that supports the post

Discussion Length (word count): At least 250 words

References: At least two peer-reviewed, scholarly journal references.

Reply Post

When replying to a classmate, use 3 – 5 sentences offering your opinion on what your thoughts on the advantages and disadvantages of their choices.Top of Form

1. The Six Phases of the MapReduce and Hadoop in Data Analytics Life Cycle is listed in the following order: Discovery, Data Preparation, Model Planning, Model Building, Communicate Results, and Operationalize.

True

False

10 points   

QUESTION 2

1. In the MapReduce paradigm, it is stated that the reduction time to complete a given task by breaking it down into stages and then executing those stages in the parallel is activity also called:

Data/Worker and Pattern/Chunks
Master/Slave and Master/Worker
Data Retrieval and Master/Pattern
Slave/Retrieval and Data/Chunks

10 points   

QUESTION 3

1. Under YARN, the content and structure of a MapReduce job is unchanged, but how the scheduling and management of the job is quite different. The Job Tracker functionality is now shared by the Resource Manager and the Application Master (App Master). The key steps include which of the following:

All of the Above.
The Application Master starts the Map tasks and monitors their status.
From the Name Node, the Application Master determines on which nodes the HDFS blocks are stored and builds an execution plan and resource requirements.
The client submits a MapReduce job to the Resource Manager which schedules the job based on cluster activity.

10 points   

QUESTION 4

1. Query Languages for Hadoop builds on core Hadoop (MapReduce and HDFS) to enhance the development and manipulation of Hadoop clusters and have the following three components:

* Java Scripting

* VBA Coding

* Match Tables

True

False

10 points   

QUESTION 5

1. HBase represents a further layer of abstraction on Hadoop. HBase has been described as “a distributed column-oriented database [data storage system]” built of top of HDFS. HBase uses additional Apache Foundation open source frameworks such as Zookeeper, which is used as a co-ordination system to maintain consistency, Hadoop for MapReduce and HDFS, and Oozie for workflow management.

True

False

10 points   

QUESTION 6

1. In regard to “In-Database” functions, Greenplum supports certain set operations as part of a SELECT statement. Which of the following is NOT a part of a SELECT statement?:

REMOVE ALL – Removes the previous executed set of data from the latest answer set.
EXCEPT – Returns rows from the first answer set and excludes those from the second.
The INTERSECT – Returns rows that appear in all answer sets.
UNION ALL – Returns a combination of rows from multiple SELECT statements with repeating rows.

10 points   

QUESTION 7

1. In regard to the Greenplum SQL OLAP Grouping Extensions, Greenplum supports the following grouping extensions:

Rollup – This extension provides hierarchical grouping.
Cube – Complete cross-tabular grouping, or all possible grouping combinations, is provided with this extension.
Grouping Sets – Generalized grouping is provided with the GROUPING SETS clause.
All of the above.

10 points   

QUESTION 8

1. In regard to the techniques discussed around text analysis. It is common practice to store the parsed data from an unstructured source in a database for down-stream analysis. With the advent of Hadoop and its ecosystem products, unstructured data can also be stored in external tables and accessed by traditional relational databases.

True

False

10 points   

QUESTION 9

1. Window function is described as a function that performs a calculation across a set of table rows that are somehow related to the current row. But unlike regular aggregate functions, the use of a window function does not cause rows to become grouped into a single output row as the rows retain their separate identities.

True

False

10 points   

QUESTION 10

1. In regard to User Defined Functions and Aggregates, Greenplum supports several function types, including: Procedural language functions where the functions are written in:

R
All of the Above.
None of the Above.
PL/pgSQL
Perl/Python
PL/TcL

10 points   

Found something interesting ?

• On-time delivery guarantee
• PhD-level professional writers
• Free Plagiarism Report

• 100% money-back guarantee
• Absolute Privacy & Confidentiality
• High Quality custom-written papers

Related Model Questions

Feel free to peruse our college and university model questions. If any our our assignment tasks interests you, click to place your order. Every paper is written by our professional essay writers from scratch to avoid plagiarism. We guarantee highest quality of work besides delivering your paper on time.

Grab your Discount!

25% Coupon Code: SAVE25
get 25% !!