Economics 430–Applied Econometrics
Homework #2 (100 Points)
Instructions: Please answer all of the following questions as best as possible. If you
have any questions please see me immediately. Partial credit will be awarded when it is
deserved. The point value for each question is in parentheses. All sub questions are of
equal value. This assignment is due February 18th.
1. (5) Would excluding variables that should belong to the model produce unbiased
OLS estimators?
2. (10) If including an irrelevant variable still implies that the OLS estimators are un-
biased, then why don’t we just include as many variables as we can?
3. (25) Suppose I am interested in modeling how wages are determined. My model of
interest is
log(wage) = β0 + β1educ+ u, (1)
where wage is the hourly wage earned and educ is total years of education a worker
has.
(a) Why do we expect β1 > 0?
(b) Should years of experience working be included in a model of wage determination
(why or why not)?
(c) If years of experience working is excluded from my model, what is the likely
affect on the bias of the OLS estimator of β1?
(d) Interpret β1 in the above model.
(e) If I inform you that years of education and years of work experience are corre-
lated, does that mean that years of work experience should not be included in
the model to eliminate collinearity (why or why not)?
(f) My estimated regression is
̂logwage = 0.284 + 0.073 · educ. (2) Interpret the coefficient on educ from my estimated regression.
1
(g) R2 = 0.304 in this model. What does this mean?
(h) Would R2 decrease if I added years of experience as a regressor in my model?
(i) Would R2 decrease if I added the individuals height as a regressor in my model?
(j) How would your interpretation of β1 change if you used log(educ) as a regressor
instead of educ?
4. (60) Use the Boston dataset in the MASS package in R to answer the following ques-
tions. You may type ?Boston in R to get complete definitions of the variables. The
following model describes the median housing price (medv) across communities in
the metro Boston area in terms of the amount of pollution (nox for nitrous oxide
concentration) and the average number of rooms in a house in the community (rm):
log(medv) = β0 + β1 log(nox) + β2rm+ ε. (3)
(a) What are the expected signs of β1 and β2 in this model?
(b) What is the interpretation of β1?
(c) What is the interpretation of β2?
(d) Estimate this model and report your coefficient estimates for the three param-
eters in this model along with R2 and the corresponding standard errors.
(e) Interpret R2.
(f) Are you concerned that collinearity is present in this setting?
(g) Why would nox and rooms be negatively correlated?
(h) Would estimating your model of housing prices omitting rooms yield an upward
or downward bias in β̂1 if rooms and nox were negatively correlated, why?
(i) Estimate your model excluding rooms and report your coefficient estimates
for the two parameters along with R2 and the corresponding standard errors.
(j) Is your estimate of β1 in the univariate model closer to the truth than in the
multiple regressor model you estimated earlier?
(k) What do you make of the vast decrease in R2 when you estimate the univariate
model?
(l) Is it possible to include rm2 in model (3)? If yes, why is it useful?
2