For the Los Angeles ozone data used in Example 15.8 of Section 15.6.3, a best subsets regression of natural log of ozone (i.e., lnOzone) versus the six predictor variables vh, temp, humidity, ibt, vis, and doy is to be computed using the methods described in Example 17.3 of Section 17.4. (A) Plot and review the scatterplot matrix, Component-plus-Residual plot, and Added Variable plot as described in Sections 15.4 and 15.5.1 to verify that the relationships between lnOzone and each of the six X variables is reasonably linear. (B) Compute the best subsets regression, and using the results from the best subsets procedure and a plot of Cp versus p, select the best available model from the candidates. Tip: The Cp versus p plot has a point that seems to be labeled as 3456, but is actually 23456, where the 2 was mistakenly truncated by the plot. (C) Is the selected model the same as the model obtained from Example 15.8? (D) Compute a multiple linear regression as described in Section 15.4 using the selected variables, and perform the usual series of diagnostics tests and checks as indicated in Problem 17.2 to assess whether the regression coefficients are reasonable, satisfy the linear regression assumptions, and were not unduly affected by influential data points. (E) In using your validated regression equation to predict ozone concentration for specified values of the X variables and considering that ozone had to be log-transformed in order to compute the linear regression, how would your prediction be affected by transformation bias?
#Sales Offer!| Get upto 25% Off: