Assignment:
1. Obtain one of the data sets available at the UCI Machine Learning Repository and apply as many of the different visualization techniques described in the chapter as possible. The bibliographic notes and book Web site provide pointers to visualization software.
2. Identify at least two advantages and two disadvantages of using color to visually represent information.
- What are the arrangement issues that arise with respect to three-dimensional plots?
- Discuss the advantages and disadvantages of using sampling to reduce the number of data objects that need to be displayed. Would simple random sampling (without replacement) be a good approach to sampling? Why or why not?
- Describe how you would create visualizations to display information that de-scribes the following types of systems.
a) Computer networks. Be sure to include both the static aspects of the network, such as connectivity, and the dynamic aspects, such as traffic.
b) The distribution of specific plant and animal species around the world fora specific moment in time.
c) The use of computer resources, such as processor time, main memory, and disk, for a set of benchmark database programs.
d) The change in occupation of workers in a particular country over the last thirty years. Assume that you have yearly information about each person that also includes gender and level of education.
Be sure to address the following issues:
· Representation. How will you map objects, attributes, and relation-ships to visual elements?
· Arrangement. Are there any special considerations that need to be taken into account with respect to how visual elements are displayed? Specific examples might be the choice of viewpoint, the use of transparency, or the separation of certain groups of objects.
· Selection. How will you handle a large number of attributes and data objects
Decision Tree Assignment
Play now? Play later?
You can become a millionaire! That’s what the junk mail said. But then there was the fine print:
If you send in your entry before midnight tonight, then here are your chances:
0.1% that you win $1,000,000
75% that you win nothing
Otherwise, you must PAY $1,000
But wait, there’s more! If you don’t win the million AND you don’t have to pay on your first attempt,
then you can choose to play one more time. If you choose to play again, then here are your chances:
2% that you win $100,000
20% that you win $500
Otherwise, you must PAY $2,000
What is your expected outcome for attempting this venture? Solve this problem using
a decision tree and clearly show all calculations and the expected monetary value at each node.
Use maximization of expected value as your decision criterion.
Answer these questions:
1) Should you play at all? (5%) If you play, what is your expected (net) monetary value? (15%)
2) If you play and don’t win at all on the first try (but don’t lose money), should you try again? (5%) Why? (10%)
3) Clearly show the decision tree (40%) and expected net monetary value at each node (25%)
Discussion :
What are the cons of data mining? Describe and provide some examples of cons in data mining that an organization may face.
You must make at least two substantive responses to your classmates’ posts. Respond to these posts in any of the following ways:
· Build on something your classmate said.
· Explain why and how you see things differently.
· Ask a probing or clarifying question.
· Share an insight from having read your classmates’ postings.
· Offer and support an opinion.
· Validate an idea with your own experience.
· Expand on your classmates’ postings.
· Ask for evidence that supports the post.
Discussion Length (word count): At least 150 words
References: At least one peer-reviewed, scholarly journal references.