For the Streamflow versus suspended solids data described in Example 14.1 of Section 14.2.1, IRLS regression of SS versus Flow is to be computed, using the lower 90% of the data (i.e., data for streamflow not exceeding 65 m3 /s approx.) as described in Example 14.1. A scatter plot of the data shows the presence of potential outliers and also shows evidence of increasing variance of SS as flow increases, both of which issues can be addressed by IRLS regression. The residuals from an initial OLS regression of the data are not normally distributed, but since the data sample size is quite large (in the range of 400 observations or so), approximate normality may be assumed based on the central limit theorem described in Section 4.4.1.2. Although large data sample size is usually desirable, in this case, the data sample is so “dense” that the OLS residuals show evidence of serial correlation, whereas IRLS, as with many statistical procedures, assume an absence of serial correlation (i.e., assume that the residuals or errors are independent). For this data sample, reducing the sample size, for instance by using data at monthly instead of daily intervals should eliminate serial correlation, but for the purposes of this exercise, we use the data “as is.” (A) Compute the OLS and IRLS regressions using R as described in Section 12.3.1.1 and Example 14.3 in Section 14.4, respectively. (B) Plot the OLS and IRLS regression lines on a scatterplot of the data to visually assess how the IRLS line deviates from OLS. (C) Obtain the final weights used for the IRLS from R as described in Example 14.3, compute the weighted residuals and then plot the weighted residuals versus the predicted Y (i.e., SS values). Compare the plot with a similar plot based on the OLS regression, and verify that IRLS has improved homoscedasticity of the errors.
#Sales Offer!| Get upto 25% Off: