Breakfast cereal manufacturers publish nutrition information on each box of their product. As we saw in Chapter 17, there is a long history of cereals being associated with nutrition. Here’s a regression to predict the number of Calories in breakfast cereals from their Sodium, Potassium, and Sugar content, and some diagnostic plots.
The shaded part of the histogram corresponds to the two cereals plotted with x’s in the Normal probability plot of the leverages and the residuals plot. These are All-Bran with Extra Fiber and All-Bran.
a) What do the displays say about the influence of these two cereals on this regression? (The histogram is of the studentized residuals.)
Here’s another regression with dummy variables defined for each of the two bran cereals.
b) Explain what the coefficients of the bran cereal dummy variables mean.
c) Which regression would you select for understanding the interplay of these nutrition components. Explain. (Note: Both are defensible.)
d) As you can see from the scatterplot, there’s another cereal with high potassium. Not too surprisingly, it is 100% Bran. But it does not have leverage as high as the other two bran cereals. Do you think it should be treated like them (i.e., removed from the model, fit with its own dummy, or left in the model with no special attention, depending on your answer to part c)? Explain.v