The FE
and FD estimator provide consistent estimators but not for the coefficient
time-invariant regressor (not identified).The FE estimator eliminates anything
that is time-invariant from the model.
This is
may be high price to pay for allowing the
-variables to be correlated with the individual specific
heterogeneity ui .
For
example, we may interested in the effect of time-invariant variables (like
gender) on a person’s wage.It’s possible to derive IV estimators that can be
considered to be in between a FE and RE approach.
Hausman-Taylor
(1981) estimator is an IV estimator that enables the coefficients of
time-invariant to be estimated.
The key
step is to distinguish between regressors uncorrelated with
and those potentially
correlated with ui
.
The
method additionally distinguishes between time-varying and time-invariant
regressor.
The
individual effects model is then written as;
yit=β0+β1x′1it+β2x′2it+α1w′1it+α2w′2it+ui+εit. (1)
where;
x′1it= k1 variables
(exogenous) that are time varying and uncorrelated with ui .
x′2it = k2 variables (endogenous) that are time varying and correlated
with. ui
w′1it = l1 variables
(exogenous) that are time-invariant and uncorrelated with ui
w′2it = l2 variables (endogenous) that are time-invariant and
correlated with ui
The
assumptions;
E(ui|x1it,w1it)=0
but E(ui|x2it,w2it)≠0
Var(ui|x1it,x2it,w1it,w2it)=σ2u
Cov(ui,εit|x1it,x2it,w1it,w2it)=0
Corr(ui+εit,ui+εis|x1it,x2it,w1it,w2it)=ρ=σ2u/σ2
The
estimation of Eq(1) will make the OLS and GLS not convergent because some
variables are correlated with the unobserved effects (random effects).
FE
estimator does not allow for estimating α1 and α2 parameter.RE
cannot be used: the correlation among variables associated to β2 and α2 parameters
and the individual effects, ui produce not
consistent estimates.
The Hausman-Taylor
(H-T) method is based on the RE transformation that leads to the model
˜yit=β1˜x′1it+β2˜x′2it+α1˜w′1it+α2˜w′2it+˜ui+˜εit (2)
where,
for example ˜x′1it=x1it−ˆθiˉx1i .
Steps on
H-T;
1.
Regress the model by OLS by using differences from the
“temporal” mean;
(yit−ˉyi)=β0+(x1it−ˉx1i)′β1+(x2it−ˉx2i)′β2+(ˉεit−ˉεi) (3)
2.
(a) From Step 1, use the residual
, ˜ε
to compute the “intra-group” temporal
mean of the residuals;
ˉε=T∑t=1˜εitT (4)
and stack
them into vector ˉe′=((T⏞ˉε1,ˉε1,..ˉε1),...(ˉεn,ˉεn,..ˉεn))
(b) Do a regression;
w′2it=β0+w′1itβ1+x′1itβ2+eit (IV) (5)
(c) Use
the predicted value^w′2it
from (b) in the big matrix W=(W*1,\hat{W}*2)
, where matrices Wk are formed using the w′ki for
each group i .
(d) Regress;
ˉεit=α0+α1W∗1it+α2\hat{W}∗2it+ϑit (7)
(e) Note:
we just did a 2SLS regression.
3.
From Step 1, estimate σ2ε from the regression.
From Step
2, estimate σ2u from
the RE model, and use the estimate of σ∗2 from
the 2SLS regression.
Since
σ∗2=σ2u+σ2εT
then an
estimate of σu is
σ2u=σ∗2−σ2εT
4.
We need weights to computed the FGLS.
Let
ˆθ=√ˆσ2εˆσ2ε+Tˆσ2u
then, for each group i , let;
V∗=[x1it,x2it,w1i,w2i]−ˆθ[x1it,x2it,w1i,w2i] (8)
y∗=yit−ˆθyit$ (9)
z′it=[(x1it−x1i)′,(x2it−x2i)′,w′1i,ˉx′1i] (10)
be the
new weighted data and z′
the matrix of instruments, then do a
2SLS regression of y∗
on V∗ with
instruments z′
:
(a)
Regress V∗
on z′, then generate the predicted values ˆV∗ .
(b)
Regress y∗ on the
predicted values ˆV∗
to get (ˆβ′,ˆα′)′ .
5.
To get the variance of(ˆβ′,ˆα′)′ , one should not use the residual of the 2SLS
regression, because it is not convergent. See Greene Ch8 eq(8.8).
The steps
in H-T suggest to estimate the Eq(2) by instrumental variables using the
following variables as a instruments set
in Eq(9): (x1it−x1i) , (x2it−x2i) and w1i , ˉx1i ;
(a)
˜x2it is
instrumented by its deviation from individual means, (x2it−x2i)
(b)
˜w2it is
instrumented by the individual average of˜x1it ,(ˉx1it)
(c)
˜x1itis
instrumented by its deviation from individual means, (x1it−x1i).
(d)
˜w1i is instrumented by w1i .
The H-T
estimator is based on upon an IV estimator which uses both the within and
between transformation of the strictly exogenous variables as instruments.
1.
The exogenous variables serve as their own IVs.
2.
The within transformation of the exogenous
individual-and-time varying variables serve as IVs for the endogenous
individual-and-time varying variables.
3.
Individual means of the exogenous individual-and-time
varying variables are used as IVs for endogenous time-invariant regressor.
If the
model is identified in the sense that there are at least many time-varying
exogenous regressor x1it
as there are individual-time invariant
endogenous regressor w2it
or k1≥l2 , then the H-T
estimator is more efficient than FE.
If the
model is under-identified where k1≤l2 , then one cannot estimate α1 and α2 , parameters and the H-T estimator of β1 and β2 are identical to FE.
The
resulting estimator of H-T allows us to estimate the effect of time-invariant
variables, even though the time-varying regressors are correlated with ui.
The trick
is to use the time averages of those time-varying regressors that are
uncorrelated with ui
as instruments for the time-invariant regressors.
This
require that sufficient time-varying variables are included that have no
correlation with ui.
The
strong advantage of the H-T approach is that one does not have to use external
instruments. With sufficient assumptions, instruments can be derived within the
model.
ESTIMATION WITH STATA
To
estimate the H-T model, we use again Paneldata01.dta.
From the
data, the wage in log (= lwage) is assumed to be a function of week worked (=wks), lives in south
area (south),lives in metropolitan area (smsa), marital status (ms),
year of education (=ed), a quadratic of work experience (=exp, exp2),
working in manufacturing (=ind), wage set be a union contract (=union),
blue collar (=occ) ,gender for female (=fem) and workers is
African American (=blk).
Let we
use the xtsum command to show within variability and which variables are time
invariant.
xtsum lwage
exp exp2 wks ms union occ south smsa ind fem blk ed
|
|
We check
the correlation between exogenous variables (= south, smsa,ind,
occ, fem and blk) and the endogenous time-invariant
variable (=ed)
pwcorr south
smsa ind occ fem blk ed,star(0.05)
The results
indicate that although fem appear to be weak instrument, the remaining
instruments are probably sufficiently correlated to identified the coefficient
on ed.
Weak IVs
lead to inconsistent estimates of the endogenous variables because there is not
enough information to identify the parameter and cause serious size distortion
in any test performed (Stock, Wright & Yogo, 2002).
Lets we
check again another correlation between exogenous variables (= south, smsa,ind,
occ, fem and blk) and endogenous time-variant variable (=wks,
ms,exp,exp2,union)
pwcorr south
smsa ind occ fem blk wks ms exp exp2 union,star(0.05)
Now, the
H-T model we want to estimate will become;
yit=β0+β1x′1it+β2x′2it+α1w′1it+α2w′2it+ui+εit (11)
where;
yit = lwage
x′1it = south, smsa, ind, occ .
x′2it = wks, ms, exp, exp2, union
w′1it = fem, blk
w′2it = ed
Before we
estimate the H-T, lets we first estimate the FE and RE estimator;
xtreg lwage
south smsa ind occ fem blk wks ms exp exp2 union ed, fe
xtreg lwage
south smsa ind occ fem blk wks ms exp exp2 union ed, re
Now,
perform the Hausman test to choose between FE and RE model;
quiet xtreg lwage south smsa ind occ fem blk wks ms exp exp2 union ed,
fe
estimates store fe
quiet xtreg lwage south smsa ind occ fem blk wks ms exp exp2 union ed,
re
estimates store re
hausman fe re,sigmamore
The
results show that χ2(9), has ρ=0.000.
This
leads to strong rejection of the hull hypothesis that RE provide consistent
estimates.
That
means, the FE model is more preferable.
The
problem now, we cannot retrieve the value of coefficients for fem, blk and
ed because the FE method drop the variables which is time-invariant.
To estimate the Eq(11) by H-T estimation with
endogenous variables is wks, ms,exp,exp2,union and ed;
xthtaylor
lwage south smsa ind occ fem blk wks ms exp exp2 union ed, endog(exp exp2 wks
ms union ed)
The
estimated of σu=0.9418 and σε=0.1518 indicating that a large fraction of the total
error variance is attributed to ui .
The z -statistics
indicate that several the coefficients may not be significantly different from
zero.
The
coefficient on time-invariant variables fem and blk have
relatively large standard errors whereas coefficient on ed is relatively
small.