MathType

Tuesday, 9 May 2017

Fixed Effects (FE) vs. Random Effects (RE) Model with Stata (Panel)




The essential distinction in panel data analysis is that between FE and RE models.

If effects are fixed, then the pooled OLS and RE estimators are inconsistent, and instead the within (or FE) estimator needs to be used.

The within estimator is otherwise less desirable, because using only within variation leads to less-efficient estimation and inability to estimate coefficients of time-invariant regressors.

Panel data provide an alternative way to obtain consistent estimates.

Consider the model with individual-efffects;
\({{y}_{it}}={{\beta }_{0}}+{{\beta }_{1}}{{x}_{it}}+{{u}_{i}}+{{v}_{it}}\)                                                  (1)

Consistency in Eq(1) model requires the weaker assumption that \(E\left( {{v}_{it}}|{{u}_{i}},{{x}_{it}} \right)=0\). . Essentially, the error has two components; the time –invariant component \({{u}_{i}}\)correlated with regressor that we can eliminate with through differencing , and a time-varying component that, given\({{v}_{it}}\), is uncorrelated with regressiors (idiosyncratic error).

The RE model adds an additional assumption to the individual-effects model; \({{u}_{i}}\) is distributed independently of \({{x}_{it}}\). This is much stronger assumption because it implies that \(E\left( {{v}_{it}}|{{u}_{i}},{{x}_{it}} \right)=E\left( {{v}_{it}}|{{x}_{it}} \right)\) , so consistency requires that \(E\left( {{v}_{it}}|{{x}_{it}} \right)=0\), as assumed by the pool OLS model.

The Hausman Test

The (Durbin-Wu-)Hausman (1978) test (also called as the Hausman specification test)in general detect endogenous regressors (explainatory variables) in regression model. Endogenous variables have values that are determined by other variables in the system.

Having endogenous regressors in a model will cause OLS estimator to fail, as one of the assumption OLS is that there is no correlation between an explainatory variables and the erro term. An instrumental variables estimators can be used as an alternative in this case.

In panel data analysis, the Hausman test can help we to to choose between FE model and RE model. 

The null hypothesis is that the prefered model is RE model; the alternative hypothesis is that the model is FE.

Essentially, the test looks to see if there is a correlation between the unique (time-invariant) erros and the regressors in the model. The null hypothesis is that there is no correlation between the two.

This test requires two things; First, there must be strict exogeneity of the error term such that \(E\left( {{v}_{it}}|{{x}_{it}},{{u}_{i}} \right)=0\) to ensure that the RE and FE estimators are consistent. 

Second, both the idiosyncratic error term and the unobserved effect have constant variance, ie both \({{v}_{it}}\sim IID\left( 0,{{\sigma }^{2}} \right)\) and \({{u}_{i}}\sim IID\left( 0,{{\sigma }^{2}} \right)\) .Failure to meet the second requirement implies that the resulting test would have an asymptotic size smaller or larger than the nominal size of the test.

The Hausman test is intended to asses how parameter estimates differ across the method, based on understanding of the trade-off between bias and variance in the two estimators. The RE model can introduce bias but reduce the variance of estimates of coefficients, while a FE model remain unbiased but has a high degree of variance (Clark and Linzer, 2013). Moreover, rejection of the null hypothesis that FE and RE estimators do not differ substantially means that the error term under RE model is probably correlated with one or more regressors (Gujerati and Porter, 2009).

The Hausman test can be used in all situations where two model specification and two estimators are available with the following properties;
1.       In the restricted model (null), the estimator \(\hat{\theta }\) is efficient, the estimator \(\tilde{\theta }\) is constant though typically not efficient;
2.       In the unrestricted model (alternative), the estimator \(\hat{\theta }\) is inconsistent, the estimator \(\tilde{\theta }\) is consistent.
Then, the difference \(q=\hat{\theta }-\tilde{\theta }\) should diverge under the alternative and its should converge to zero under the null. Moreover, under the null \(q\) and \(\hat{\theta }\) should be uncorrelated.
The null for the RE and the alternative of the FE model correspond to the Hausman situation:
1.       In the RE model, the GLS-type RE estimator is efficient by construction for Gaussian errors, the FE estimator and even the OLS estimator are consistent.
2.       In the FE model, the RE estimator is inconsistent, because of the omitted-variable effects, while FE is consistent by construction.

The Hausman test statistic is define as;                            
\(m=q'{{\left( \operatorname{var}{{{\hat{\beta }}}_{FE}}-\operatorname{var}{{{\hat{\beta }}}_{RE}} \right)}^{-1}}q\)                                                                    (2)

with \(q={{\hat{\beta }}_{FE}}-{{\hat{\beta }}_{RE}}\). Under RE, the matrix difference in brackets is positive, as the RE estimator is efficient and any other estimator has a large variance.

The statistic \(m\)  is distributed \({{\chi }^{2}}\) under the null of RE, with degrees of freedom determined by the dimension of \(\beta \),\(K\) .

Such estimates Eq(2) used in this statistic are assumed to hold the assumption of homoskedasticity or the presence of contant variance considering that the RE estimator is efficient.

However, if the homoskedacticity is not present, then we can use the robust Hausman test introduced by Wooldridge (2002) using the following equation;
\(\left( {{y}_{it}}-\hat{\theta }{{{\bar{y}}}_{i}} \right)=\left( 1-\hat{\theta } \right)u+\left( {{x}_{1it}}-\hat{\theta }{{{\bar{x}}}_{1i}} \right)'{{\beta }_{1}}+\left( {{x}_{1it}}-{{{\bar{x}}}_{1i}} \right)'\gamma +{{v}_{it}}\)   (3)   


where the RE differences and the mean differences are taken into account while testing to ensure that \(\gamma =0\) as seen in Wooldridge (2002) using standard pooled OLS analysis. The  null hypothesis is \({{H}_{0}}:{{\gamma }_{1}}={{\gamma }_{2}}=..={{\gamma }_{i}}=0\) which is FE is not valid and the alternative hypothesis is \({{H}_{a}}:{{\gamma }_{1}}={{\gamma }_{2}}=..={{\gamma }_{i}}\ne 0\) .

For our discussion using Stata, lets we use the data airline.dta again as we discuss the FE and RE model in here  and here, and we want to estimate the effects of output, fuel and loadinfg factor to the cost of airline companies;
\(cos{{t}_{it}}={{\beta }_{0}}+{{\beta }_{1}}outpu{{t}_{it}}+{{\beta }_{2}}fue{{l}_{it}}+{{\beta }_{3}}loa{{d}_{it}}+{{v}_{it}}\)   (4)                   

where;
\(cos{{t}_{it}}\)     = cost of airline companies                                         
\(outpu{{t}_{it}}\)  = revenue passanger mile (output index)     
\(fue{{l}_{it}}\)  = fuel prices                      
\(loa{{d}_{it}}\)  = loading factor (average capacity utilization of the fleet)

Now, lets us regress the Eq(4) by the FE and RE estimation and save the results into Stata memory.

xtset airline year
quiet xtreg cost output fuel load,fe
estimates store fe
quiet xtreg cost output fuel load, re
estimates store re

and then  perform the Hausman test based on Eq(2);
 
hausman fe re, sigmamore

 

The output from Hausman test provides a nice side-by-side comparison. For for the coefficient  of regressor output, a test of RE against RE yields \(t=0.0126/0.0155=0.8129\) , which is show not statistically significant difference.

The overall statistic, \({{\chi }^{2}}\left( 3 \right)=3.24\) , has \(p=0.3561\) . This leads failing to reject the null hypothesis that RE provides consistent estimates. That means, the RE model is preferable.

To perform the robust Hausman test by Wooldridge (2002) based on Eq(3);

quiet xtreg cost output fuel load,re
scalar theta = e(theta)
global xlist2 cost output fuel load
sort airline
foreach x of varlist $xlist2 {
     by airline: egen mean`x' = mean(`x')
     generate md`x' = `x' - mean`x' 
     generate red`x' = `x' - theta*mean`x'
      }
quiet reg redcost redoutput redfuel redload mdoutput mdfuel mdload, vce(cluster airline)
test mdoutput mdfuel mdload
 
 

             
The test is fail to reject the null hypothesis at 5% significance level, and we concluded that the RE model is more appropriate.


Overidentifying Test

Sargan (1975) and Hansen(1982) which is called as Sargan-Hansen test, proposed a test of FE vs RE is seen as a test of overidentifying restriction.

A test of FE vs. RE can also be seen as a test of overidentifying restrictions.  The fixed effects estimator uses the orthogonality conditions that theregressors are uncorrelated with the idiosyncratic error, \(E\left( {{X}_{it}}|{{\varepsilon }_{it}} \right)=0\).

The random effects estimator uses the additional orthogonality conditions that the regressors are uncorrelated with the group-specific error \({{u}_{i}}\)  (the "random effect"), i.e., \(E\left( {{X}_{it}}|{{u}_{i}} \right)=0\). 

These additional orthogonality conditions are overidentifying restrictions.  The test is implemented by xtoverid using the artificial regression approach described by Arellano (1993) and Wooldridge (2002, pp. 290-91), in which a random effects equation is reestimated augmented with additional variables consisting of the original regressors transformed into deviations-from-mean form. 

The test statistic is a Wald test of the significance of these additional regressors.  A large-sample chi-squared test statistic is reported with no degrees-of-freedom corrections.  Under conditional homoskedasticity, this test statistic is asymptotically equivalent to the usual Hausman fixed-vs-random effects test; with a balanced panel, the artificial regression and Hausman test statistics are numerically equal. 

Unlike the Hausman version, the test reported by xtoverid extends straightforwardly to heteroskedastic- and cluster-robust versions, and is guaranteed always to generate a nonnegative test statistic.

To perform the overidentiying test ;

quiet xtreg cost output fuel load,re
xtoverid



The test show that the Sargan-Hansen statistics is \({{\chi }^{2}}\left( 3 \right)=3.249\) and the \(p-value=0.3547\) suggest that we successfully reject the null hypothesis.

The decision is, the RE model is better than the FE model.

No comments:

Post a Comment