Using Indicator Variables
Consider the square footage at first:
– β2 is the value of an additional square foot of living area and β1 is the value of the land alone
How do we account for location, which is a qualitative variable?
– Indicator variables are used to account for qualitative factors in econometric models
– They are often called dummy, binary or dichotomous variables, because they take just two values, usually one or zero, to indicate the presence or absence of a characteristic or to indicate whether a condition is true or false
– They are also called dummy variables, to indicate that we are creating a numeric variable for a qualitative, non-numeric characteristic
– We use the terms indicator variable and dummy variable interchangeably
Generally, we define an indicator variable D as:
– So, to account for location, a qualitative variable, we would have:
Adding our indicator variable to our model:
If our model is correctly specified, then:
Adding an indicator variable causes a parallel shift in the relationship by the amount δ
An indicator variable like D that is incorporated into a regression model to capture a shift in the intercept as the result of some qualitative factor is called an intercept indicator variable, or an intercept dummy variable
The least squares estimator’s properties are not affected by the fact that one of the explanatory variables consists only of zeros and ones
– D is treated as any other explanatory variable.
– We can construct an interval estimate for D, or we can test the significance of its least squares estimate
FIGURE 7.1 An intercept indicator variable
The value D = 0 defines the reference group, or base group
We could pick any base
For example:
Then our model would be:
Suppose we included both D and LD:
– The variables D and LD are such that D + LD = 1
– Since the intercept variable x1 = 1, we have created a model with exact collinearity
– We have fallen into the dummy variable trap.
– By including only one of the indicator variables the omitted variable defines the reference group and we avoid the problem
Suppose we specify our model as:
– The new variable (SQFT x D) is the product of house size and the indicator variable
– It is called an interaction variable, as it captures the interaction effect of location and size on house price
– Alternatively, it is called a slope-indicator variable or a slope dummy variable, because it allows for a change in the slope of the relationship
Now we can write:
FIGURE 7.2 (a) A slope-indicator variable (b) Slope- and intercept-indicator variables
The slope can be expressed as:
Assume that house location affects both the intercept and the slope, then both effects can be incorporated into a single model:
– The variable (SQFTD) is the product of house size and the indicator variable, and is called an interaction variable
– Alternatively, it is called a slope-indicator variable or a slope dummy variable
We can see that:
Consider the wage equation:
– The expected value is:
Applying Indicator Variables
Table 7.3 Wage Equation with Race and Gender
Recall that the test statistic for a joint hypothesis is:
To test the J = 3 joint null hypotheses H0: δ1 = 0, δ2 = 0, γ = 0, we use SSEU = 130194.7 from Table 7.3
– The SSER comes from fitting the model:
for which SSER = 135771.1
Therefore:
– The 1% critical value (i.e., the 99th percentile value) is F(0.99,3,995) = 3.80.
– Thus, we conclude that race and/or gender affect the wage equation.
Now consider our wage equation:
“Are there differences between the wage regressions for the south and for the rest of the country?’’
– If there are no differences, then the data from the south and other regions can be pooled into one sample, with no allowance made for differing slope or intercept
– To test this, we specify:
(7.10)
Now examine this version of Eq. 7.10:
Table 7.5 Comparison of Fully Interacted to Separate Models
From the table, we note that:
We can test for a southern regional difference.
We estimate Eq. 7.10 and test the joint null hypothesis
Against the alternative that at least one θi ≠ 0
This is the Chow test
The F-statistic is:
– The 10% critical value is Fc = 1.85, and thus we fail to reject the hypothesis that the wage equation is the same in the southern region and the remainder of the country at the 10% level of significance
– The p-value of this test is p = 0.9009
Remark:
– The usual F-test of a joint hypothesis relies on the assumptions MR1–MR6 of the linear regression model
– Of particular relevance for testing the equivalence of two regressions is assumption MR3, that the variance of the error term, var(ei ) = σ2, is the same for all observations
– If we are considering possibly different slopes and intercepts for parts of the data, it might also be true that the error variances are different in the two parts of the data
– In such a case, the usual F-test is not valid.
Consider the wage equation in log-linear form:
– What is the interpretation of δ?
Expanding our model, we have:
Log-linear Models
Let’s first write the difference between females and males:
– This is approximately the percentage difference
The estimated model is:
– We estimate that there is a 24.32% differential between male and female wages
For a better calculation, the wage difference is:
– But, by the property of logs:
Subtracting 1 from both sides:
– The percentage difference between wages of females and males is 100(eδ – 1)%
– We estimate the wage differential between males and females to be:
100(eδ – 1)% = 100(e-0.2432 – 1)% = -21.59%
12
ââ
PRICESQFTe
=++
1 if characteristic is present
0 if characteristic is not present
D
ì
=
í
î
1 if property is in the desirable neig
hborhood
0 if property is not in the desirable
neighborhood
D
ì
=
í
î
12
ââ
PRICEDSQFTe
d
=+++
(
)
(
)
12
12
ââ when 1
ââ when 0
SQFTD
EPRICE
SQFTD
d
ì
++=
ï
=
í
+=
ï
î
1 if property is not in the desirable
neighborhood
0 if property is in the desirable neig
hborhood
LD
ì
=
í
î
12
ââ
PRICELDSQFTe
l
=+++
12
ââ
PRICEDLDSQFTe
dl
=++++
(
)
12
ââ
PRICESQFTSQFTDe
g
=++´+
(
)
(
)
(
)
12
12
12
ââ
ââ when 1
ââ when 0
EPRICESQFTSQFTD
SQFTD
SQFTD
g
g
=++´
ì
++=
ï
=
í
+=
ï
î
(
)
2
2
âã when 1
â when 0
D
EPRICE
D
SQFT
+=
¶
ì
=
í
=
¶
î
(
)
12
âäâã
PRICEDSQFTSQFTDe
=+++´+
(
)
(
)
(
)
12
12
âäâã when 1
ââ when 0
SQFTD
EPRICE
SQFTD
ì
+++=
ï
=
í
+=
ï
î
(
)
1212
ââää
ã
WAGEEDUCBLACKFEMALE
BLACKFEMALEe
=+++
+´+
(
)
(
)
(
)
(
)
12
112
122
1122
ââ –
âäâ –
âäâ –
âääãâ –
EDUCWHITEMALE
EDUCBLACKMALE
EWAGE
EDUCWHITEFEMALE
EDUCBLACKFEMALE
+
ì
ï
++
ï
=
í
++
ï
ï
++++
î
(
)
(
)
RU
U
SSESSEJ
F
SSENK
–
=
–
·
(
)
(
)
(
)
6.71031.9803
1.9142 0.1361
WAGEEDUC
se
=-+
(
)
(
)
(
)
135771.1130194.73
14.21
130194.7995
RU
U
SSESSEJ
F
SSENK
—
===
–
(
)
(
)
(
)
(
)
(
)
1212
1
23
4
5
ââää
ãè
èè
è
è
WAGEEDUCBLACKFEMALE
BLACKFEMALESOUTH
EDUCSOUTHBLACKSOUTH
FEMALESOUTH
BLACKFEMALESOUTHe
=+++
+´+
+´+´
+´
+´´+
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
1212
112213
245
ââää
ã
0
âèâèäè
äèãè 1
EDUCBLACKFEMALE
BLACKFEMALESOUTH
EWAGE
EDUCBLACK
FEMALEBLACKFEMALESOUTH
+++
ì
ï
+´=
ï
=
í
+++++
ï
ï
++++´=
î
89088.540895.9
129984.4
fullnonsouthsouth
SSESSESSE
=+
=+
=
012345
:
èèèèè0
H
=====
(
)
(
)
(
)
130194.7129984.45
129984.4990
0.3203
RU
U
SSESSEJ
F
SSENK
–
=
–
–
=
=
(
)
12
ln
ââä
WAGEEDUCFEMALE
=++
(
)
(
)
12
12
ââ ( 0)
ln
âäâ ( 1)
EDUCMALESFEMALES
WAGE
EDUCFEMALESMALES
+=
ì
ï
=
í
++=
ï
î
(
)
(
)
lnln
ä
FEMALESMALES
WAGEWAGE
-=
(
)
·
(
)
(
)
(
)
(
)
ln1.65390.09620.2432
0.0844 0.0060 0.03
27
WAGEEDUCFEMALE
se
=+-
(
)
(
)
lnlnln
FEMALES
FEMALESMALES
MALES
WAGE
WAGEWAGE
WAGE
d
æö
-==
ç÷
èø
FEMALES
MALES
WAGE
e
WAGE
d
=
1
FEMALESMALESFEMALESMALES
MALESMALESMALES
WAGEWAGEWAGEWAGE
e
WAGEWAGEWAGE
d
–
-==-