Daily Archives: May 20, 2022

Given the three possible causes of a spike in favorable ratings (restaurant campaign, actual customer sentiment, and random variation) and discuss the role of a hypothesis test in distinguishing among the three causes.

A social media site that allows community members to rate restaurants wants to ensure that the reviews are genuine reviews of the community and not favorable reviews orchestrated by the friends of the restaurant owners. It uses several metrics when examining reviews, one of which is the “percent favorable.” It theorizes that a spike in the percent favorable might represent a campaign by the restaurant, but also recognizes that it might reflect genuine customer sentiment, or just random variation. So it collects 4 days’ worth of such data on a periodic basis and subjects it to a hypothesis test. For one restaurant, the percent favorable has stood at 60% for the last year. One recent 4-day sample showed 72% favorable—23 favorable reviews and 9 unfavorable.

(a) Specify and….

A software service firm has been growing relatively rapidly and concentrating on winning contracts and delivering on them.

A software service firm has been growing relatively rapidly and concentrating on winning contracts and delivering on them. It has paid less attention to getting paid on time, and the investors who own the firm have established a target of $2500, maximum, on the average level of accounts receivable (the average per account, not the total across all accounts). If it is clear that this average is being exceeded, they want a corrective action. On the other hand, they agree with the management that the primary focus should be on winning contracts and delivering on them. So it has been decided to take periodic samples of accounts receivable to watch for clear evidence that the target is being exceeded. Here is the most recent one (in $):

950,….

What would be more reasonable here, to think of height as dependent on age or age dependent on height?

The following questions use this table, which can be downloaded here in Excel format in girls.xls. These data are the heights (in cm) of girls of different ages (in years). For the following questions, you may use software of your choice. (Our source is Siegel and Morgan, Statistics and Data Analysis: An Introduction, 2nd. ed., John Wiley and Sons, 1996. Their source is the 1980 World Almanac)

(a) What would be more reasonable here, to think of height as dependent on age or age dependent on height?

(b) Make a scatterplot of these data and describe what you see.

(c) Find the correlation between these two variables (age and height).

(d) It only makes sense to compute a correlation under certain circumstances. Why is it reasonable here?

….

What is the hypothesized direction of causality? Is this direction confirmed by the results of the hypothesis test?

Activity in the portions of the brain connected with social perception was tracked for a sample of university students, and the number of Facebook friends was recorded for the same sample. (From R. Kanai, B. Bahrami, R. Roylance, G. Rees, “Online social network size is reflected in human brain structure,” Proceedings of the Royal Society B, Published online http://rspb.royalsocietypublishing.org/content/early/2011/10/12/ rspb.2011.1959.full)

Review the data in Brain-Facebook.xls, then

(a) Plot the two variables in a scatterplot.

(b) The metric used for GM density is complex to some extent and it is not necessary to understand it for the purposes of this exercise. However, looking at the scale for GM density, make a guess at how those numbers were calculated. Hint: Review Section 4.4.

(c) Calculate the correlation between brain activity….

A retail company wants to find out whether clickthroughs are a good substitute for sales in evaluating the effectiveness of an online ad. One clickthrough is one person clicking on an ad to learn more.

A retail company wants to find out whether clickthroughs are a good substitute for sales in evaluating the effectiveness of an online ad. One clickthrough is one person clicking on an ad to learn more. Clickthroughs have the advantage of being much more plentiful than sales and accumulating much more quickly, allowing the firm to judge quickly whether an ad is effective. Here is some data on sales and clickthroughs for 13 ads (you can download an Excel workbook with the data clickthroughs.xls):

(a) Calculate (using a software program, if you wish) the correlation coefficient and explain how it can be used to assess whether clickthroughs are a good substitute for sales.

(b) Assume now that the company has determined that clickthroughs are, indeed, an adequate proxy….

The following questions use this table, which can be downloaded here in Excel format in girls.xls.

The following questions use this table, which can be downloaded here in Excel format in girls.xls. These data are the heights (in cm) of girls of different ages (in years). For the following questions, you may use software of your choice. (Our source is Siegel and Morgan, Statistics and Data Analysis: An Introduction, 2nd. ed., John Wiley and Sons, 1996. Their source is the 1980 World Almanac).

(a) Find the least squares regression equation for predicting height from age.

(b) Interpret the regression coefficients in terms of the growth of girls.

(c) Use the regression equation to predict the “height” of a 100-year-old “girl.” Comment on the result.

(d) Plot the residuals from the regression against age. (You may plot standardized residuals if your software prefers those,….

Explain in words what interaction would mean in terms of this study and these variables. Give an example.

Use the antimony data discussed in this week’s lesson. You can find the data at the book web site as a plain text file and as an Excel and CSV spreadsheet. Run a two-way ANOVA including interaction and use this model for all the following questions even if you decide the interaction term is expendable. Additional procedures may also be needed to fully answer the following questions.

Important: The antimony data are available in two formats:

• Standard statistical/database format for data, where each row is an observation.

• Table form, in which each column is a different cooling method, each row is a specific level of antimony, and the values in the table are the measured strengths. There are three observations for each combination of antimony and….

This problem uses an expanded version of the Boston Housing data. This version of the data contains five predictor variables for MEDV.

This problem uses an expanded version of the Boston Housing data. This version of the data contains five predictor variables for MEDV. The five variables are:

CRIM—crime rate per 1000 persons

NOX—nitric oxide concentration (parts per 10 million)

RM—average number of rooms per dwelling

PTRATIO—pupil–teacher ratio by town

LSTAT—% lower status of the population

(a) Make a scatterplot of MEDV versus CRIM. What do you see? On the basis of your scatterplot, does CRIM appear helpful in predicting MEDV?

(b) Run a regression of MEDV as a function of CRIM. Report a p-value and interpret it. From this, does CRIM appear helpful in predicting MEDV?

(c) Make a scatterplot of MEDV versus LSTAT. What do you see? On the basis of your scatterplot, does LSTAT appear helpful in….

erform a multiple linear regression. Specify appropriate independent and response variables. Report the resulting equation.

Use the dataset “Tayko-known.xls” for the following problems.

(a) Perform a multiple linear regression. Specify appropriate independent and response variables. Report the resulting equation. (Hint: this duplicates an illustration in the chapter.)

(b) Calculate, or locate in your regression output, the following statistics, and use them in a sentence or two describing the Tayko-known data and your regression. The goal is to convey to your reader an understanding of average spending, how variable it is, and how much a typical prediction might be in error.

– standard deviation of the spending

– mean spending

– RMSE

(c) Use the regression equation to predict spending levels for the customer records in “Tayko-Unknown.xls.”

(d) Sort the results in #3 by predicted spending, and report the top 10 customers for predicted….

The following regression equations relate to concentrations of chlorophyll (a measure of lake quality—higher levels indicate algae and possible eutrophication), phosphate, and nitrogen.

The following regression equations relate to concentrations of chlorophyll (a measure of lake quality—higher levels indicate algae and possible eutrophication), phosphate, and nitrogen. (Based partially on a problem in Manly, B., Statistics for Environmental Science and Management, 2nd ed., CRC Press, p.71)

The specification of the model by the researcher:

(1) CH = b0 + b1(PH) + b2(NT)

The regression output:

(2) CH = -9.386 + 0.333PH + 1.200NT

Output from an additional regression:

(3) CH = -16.244 + 0.313PH + 0.960NT + 0.412(NT)(PH)

Questions (answer each with single short sentence, or even shorter):

(a) Considering the three variables and how they are arranged in Equation (1), what does the researcher believe is the “cause and effect” relationship among the three variables (i.e., what causes what)?

(b) What….