The
essential distinction in panel data analysis is that between FE and RE models.
If
effects are fixed, then the pooled OLS and RE estimators are inconsistent, and
instead the within (or FE) estimator needs to be used.
The
within estimator is otherwise less desirable, because using only within
variation leads to less-efficient estimation and inability to estimate
coefficients of time-invariant regressors.
Panel
data provide an alternative way to obtain consistent estimates.
Consider
the model with individual-efffects;
\({{y}_{it}}={{\beta
}_{0}}+{{\beta }_{1}}{{x}_{it}}+{{u}_{i}}+{{v}_{it}}\) (1)
Consistency in Eq(1) model requires the
weaker assumption that \(E\left(
{{v}_{it}}|{{u}_{i}},{{x}_{it}} \right)=0\). . Essentially, the error has two components;
the time –invariant component \({{u}_{i}}\)correlated with regressor that we can eliminate
with through differencing , and a time-varying component that, given\({{v}_{it}}\), is uncorrelated
with regressiors (idiosyncratic error).
The RE model adds an additional assumption
to the individual-effects model; \({{u}_{i}}\) is distributed independently of \({{x}_{it}}\). This is
much stronger assumption because it implies that \(E\left( {{v}_{it}}|{{u}_{i}},{{x}_{it}} \right)=E\left(
{{v}_{it}}|{{x}_{it}} \right)\) , so consistency requires that \(E\left( {{v}_{it}}|{{x}_{it}}
\right)=0\), as assumed by the pool OLS model.
The
Hausman Test
The (Durbin-Wu-)Hausman (1978) test (also
called as the Hausman specification test)in general detect endogenous
regressors (explainatory variables) in regression model. Endogenous variables
have values that are determined by other variables in the system.
Having endogenous regressors in a model
will cause OLS estimator to fail, as one of the assumption OLS is that there is
no correlation between an explainatory variables and the erro term. An
instrumental variables estimators can be used as an alternative in this case.
In panel data analysis, the Hausman test
can help we to to choose between FE model and RE model.
The null hypothesis is
that the prefered model is RE model; the alternative hypothesis is that the
model is FE.
Essentially, the test looks to see if there
is a correlation between the unique (time-invariant) erros and the regressors
in the model. The null hypothesis is that there is no correlation between the
two.
This test requires two things; First, there
must be strict exogeneity of the error term such that \(E\left( {{v}_{it}}|{{x}_{it}},{{u}_{i}}
\right)=0\) to ensure that the RE and FE estimators are consistent.
Second, both the idiosyncratic error term and the unobserved effect have
constant variance, ie both \({{v}_{it}}\sim
IID\left( 0,{{\sigma }^{2}} \right)\) and \({{u}_{i}}\sim IID\left( 0,{{\sigma }^{2}} \right)\)
.Failure to meet the second requirement implies that the resulting test would
have an asymptotic size smaller or larger than the nominal size of the test.
The Hausman test is intended to asses how
parameter estimates differ across the method, based on understanding of the
trade-off between bias and variance in the two estimators. The RE model can
introduce bias but reduce the variance of estimates of coefficients, while a FE
model remain unbiased but has a high degree of variance (Clark and Linzer,
2013). Moreover, rejection of the null hypothesis that FE and RE estimators do
not differ substantially means that the error term under RE model is probably
correlated with one or more regressors (Gujerati and Porter, 2009).
The Hausman test can be used in all
situations where two model specification and two estimators are available with
the following properties;
1.
In the restricted model (null),
the estimator \(\hat{\theta
}\) is efficient, the estimator \(\tilde{\theta }\) is constant though typically
not efficient;
2.
In the unrestricted model
(alternative), the estimator \(\hat{\theta
}\) is inconsistent, the estimator \(\tilde{\theta }\) is consistent.
Then, the difference \(q=\hat{\theta }-\tilde{\theta
}\) should diverge under the alternative and its should converge to zero
under the null. Moreover, under the null \(q\) and \(\hat{\theta }\) should be uncorrelated.
The null for the RE and the alternative of
the FE model correspond to the Hausman situation:
1.
In the RE model, the GLS-type
RE estimator is efficient by construction for Gaussian errors, the FE estimator
and even the OLS estimator are consistent.
2.
In the FE model, the RE
estimator is inconsistent, because of the omitted-variable effects, while FE is
consistent by construction.
The Hausman test
statistic is define as;
\(m=q'{{\left(
\operatorname{var}{{{\hat{\beta }}}_{FE}}-\operatorname{var}{{{\hat{\beta
}}}_{RE}} \right)}^{-1}}q\) (2)
with \(q={{\hat{\beta }}_{FE}}-{{\hat{\beta }}_{RE}}\).
Under RE, the matrix difference in brackets is positive, as the RE estimator is
efficient and any other estimator has a large variance.
The statistic \(m\) is
distributed \({{\chi }^{2}}\)
under the null of RE, with degrees of freedom determined by the dimension of \(\beta \),\(K\) .
Such estimates Eq(2) used in this statistic
are assumed to hold the assumption of homoskedasticity or the presence of
contant variance considering that the RE estimator is efficient.
However, if the homoskedacticity is not
present, then we can use the robust Hausman test introduced by Wooldridge
(2002) using the following equation;
\(\left(
{{y}_{it}}-\hat{\theta }{{{\bar{y}}}_{i}} \right)=\left( 1-\hat{\theta }
\right)u+\left( {{x}_{1it}}-\hat{\theta }{{{\bar{x}}}_{1i}} \right)'{{\beta }_{1}}+\left(
{{x}_{1it}}-{{{\bar{x}}}_{1i}} \right)'\gamma +{{v}_{it}}\) (3)
where the RE differences and the mean
differences are taken into account while testing to ensure that \(\gamma =0\) as seen in
Wooldridge (2002) using standard pooled OLS analysis. The null hypothesis is \({{H}_{0}}:{{\gamma }_{1}}={{\gamma
}_{2}}=..={{\gamma }_{i}}=0\) which is FE is not valid and the
alternative hypothesis is \({{H}_{a}}:{{\gamma
}_{1}}={{\gamma }_{2}}=..={{\gamma }_{i}}\ne 0\) .
For
our discussion using Stata, lets we use the data airline.dta again as we discuss the FE and
RE model in here
and here, and we want to estimate the effects of output, fuel and
loadinfg factor to the cost of airline companies;
\(cos{{t}_{it}}={{\beta }_{0}}+{{\beta }_{1}}outpu{{t}_{it}}+{{\beta
}_{2}}fue{{l}_{it}}+{{\beta }_{3}}loa{{d}_{it}}+{{v}_{it}}\) (4)
where;
\(cos{{t}_{it}}\) =
cost of airline companies
\(outpu{{t}_{it}}\) =
revenue passanger mile (output index)
\(fue{{l}_{it}}\) = fuel
prices
\(loa{{d}_{it}}\) =
loading factor (average capacity utilization of the fleet)
Now,
lets us regress the Eq(4) by the FE and RE estimation and save the results into
Stata memory.
xtset airline year
quiet xtreg cost output fuel load,fe
estimates store fe
quiet xtreg cost output fuel load, re
estimates store re
and
then perform the Hausman test based on
Eq(2);
hausman fe re, sigmamore
The
output from Hausman test provides a nice side-by-side comparison. For for the
coefficient of regressor output,
a test of RE against RE yields \(t=0.0126/0.0155=0.8129\) , which is show not statistically
significant difference.
The
overall statistic, \({{\chi
}^{2}}\left( 3 \right)=3.24\) , has \(p=0.3561\) . This leads failing to reject the
null hypothesis that RE provides consistent estimates. That means, the RE model
is preferable.
To
perform the robust Hausman test by Wooldridge (2002) based on Eq(3);
quiet xtreg cost
output fuel load,re
scalar theta = e(theta)
global xlist2 cost
output fuel load
sort airline
foreach x of
varlist $xlist2 {
by
airline: egen mean`x' = mean(`x')
generate
md`x' = `x' - mean`x'
generate
red`x' = `x' - theta*mean`x'
}
quiet reg redcost
redoutput redfuel redload mdoutput mdfuel mdload, vce(cluster airline)
test mdoutput
mdfuel mdload
The test
is fail to reject the null hypothesis at 5% significance level, and we
concluded that the RE model is more appropriate.
Overidentifying Test
Sargan (1975) and
Hansen(1982) which is called as Sargan-Hansen test, proposed a test of FE vs RE
is seen as a test of overidentifying restriction.
A test of FE vs. RE can also be
seen as a test of overidentifying restrictions. The fixed effects
estimator uses the orthogonality conditions that theregressors are uncorrelated
with the idiosyncratic error, \(E\left( {{X}_{it}}|{{\varepsilon }_{it}} \right)=0\).
The random effects estimator
uses the additional orthogonality conditions that the regressors are
uncorrelated with the group-specific error \({{u}_{i}}\)
(the "random effect"), i.e., \(E\left( {{X}_{it}}|{{u}_{i}} \right)=0\).
These additional orthogonality
conditions are overidentifying restrictions. The test is implemented by xtoverid using the artificial regression approach described by Arellano (1993) and
Wooldridge (2002, pp. 290-91), in which a random effects equation is
reestimated augmented with additional variables consisting of the original
regressors transformed into deviations-from-mean form.
The test statistic is a Wald
test of the significance of these additional regressors. A large-sample
chi-squared test statistic is reported with no degrees-of-freedom
corrections. Under conditional homoskedasticity, this test statistic is
asymptotically equivalent to the usual Hausman fixed-vs-random effects test;
with a balanced panel, the artificial regression and Hausman test statistics
are numerically equal.
Unlike the Hausman version,
the test reported by xtoverid extends straightforwardly to
heteroskedastic- and cluster-robust versions, and is guaranteed always to
generate a nonnegative test statistic.
To
perform the overidentiying test ;
quiet xtreg cost
output fuel load,re
xtoverid
The test
show that the Sargan-Hansen statistics is \({{\chi }^{2}}\left( 3 \right)=3.249\) and the \(p-value=0.3547\)
suggest that we successfully reject the null hypothesis.
The
decision is, the RE model is better than the FE model.
No comments:
Post a Comment