MathType

Sunday 26 June 2016

Within and Between Estimator with Stata (Panel)





WITHIN ESTIMATOR (The xtreg,fe command)

The individual-spesific-effects model for the scalar dependent variable \({{y}_{it}}\) specifies that;
                               
                             \({{y}_{it}}=\alpha +{{x}_{it}}\beta +{{\varepsilon }_{t}}\)                             (1)

where \({{x}_{it}}\) are regressor, \({{\alpha }_{i}}\) are random individual-spesific-effects, and \({{\varepsilon }_{it}}\)  is and idiosyncratic error.

In the fixed-effect (FE) model, the \({{\alpha }_{i}}\) in the model Eq(1) can be eliminated by subtraction of the corresponding model for individual means;

                \({{\bar{y}}_{i}}={{\alpha }_{i}}+{{\bar{x}}_{i}}\beta +{{\bar{\varepsilon }}_{i}}\)                               (2)

where, for example \(\bar{x}=T_{i}^{-1}\sum\nolimits_{t=1}^{{{T}_{i}}}{{{x}_{it}}}\)

Subtracts Eq(1) to Eq (2)

                \(\left( {{y}_{it}}-{{{\bar{y}}}_{i}} \right)=\left( {{x}_{it}}-{{{\bar{x}}}_{it}} \right)'\beta +\left( {{\varepsilon }_{it}}-{{{\bar{\varepsilon }}}_{i}} \right)\)                            (3)


Because \({{\alpha }_{i}}\) has been eliminated, OLS leads to consistent estimates of \(\beta\) even if  \({{\alpha }_{i}}\) is correlated with \({{x}_{it}}\)  as in case of the FE model.

This results give great advantage of panel data.

The disadvantage is inability to estimate the coefficients or a time-invariant regressor.

Also within estimator will be relatively imprecise for time-varying regressors that vary little over time.

Stata actually fit the model;

\(\left( {{y}_{it}}-{{{\bar{y}}}_{i}}+\bar{\bar{y}} \right)=\alpha +\left( {{x}_{it}}-{{{\bar{x}}}_{it}}+\bar{\bar{x}} \right)'\beta +\left( {{\varepsilon }_{it}}-{{{\bar{\varepsilon }}}_{i}}+\bar{\bar{\varepsilon }} \right)\)                     (4)

where, for example \(\bar{\bar{y}}=\left( 1/N \right){{\bar{y}}_{i}}\)  is the grand mean of \({{y}_{it}}\) .

This parameterization has the advantage of providing an intercept estimate, the average of the individual effects \({{\alpha }_{i}}\), while yielding the same slope estimate \(\beta \)  as that from the within model.

The within estimator is computed by using xtreg command with the fe option.
The default standard error assume that after controlling for \({{\alpha }_{i}}\) , the error \({{\varepsilon }_{it}}\) is i.i.d.

The vce (robust) option relaxes this assumption and provides cluster-robust standard error, provided that observations are independent over \(i\)  and \(N\to \infty \) .

To estimate Eq(4) using same variables before when we discuss within and between variation ,

xtreg lwage exp exp2 wks ed,fe vce(cluster id)


 

















Coefficient of edu is not identified because the data on education is time-invariant.


WITHIN ESTIMATOR (LSDV Regression- areg command)

Another name for the within estimator is the least-square dummy-variable (LSDV) estimator.

This because it can be shown to equal the estimator obtained from OLS estimation of \({{y}_{it}}\)  on \({{x}_{it}}\) and \(N\) individual-specific indicator variables \({{d}_{j,it,}}j=1,...,N\), where \({{d}_{j,it,}}=1\)  for the it-th observation if \(j=1\) , and \({{d}_{j,it,}}j=0\) otherwise.

Thus, we fit the model;

                \({{y}_{it}}=\left( \sum\nolimits_{j=1}^{N}{{{\alpha }_{i}}{{d}_{j,it}}{{x}_{it}}\beta } \right)+{{\varepsilon }_{it}}\)                                                (5)


This equivalence of LSDV and within estimators does not carry over to nonlinear models.
To estimate Eq(5) using same variable before,

areg lwage exp exp2 wks ed, absorb(id) vce(cluster id)



 


 


The coefficient estimates are the same as those from xtreg,fe.

The cluster-robust standard error differ because of difference small-sample correction.

Thus, xtreg,fe should be used.

This difference arises because inference for areg is designed for case where \(N\)  is fixed and \(T\to \infty \) , whereas we are considering short panel case, where \(T\)  is fixed and \(N\to \infty \).





BETWEEN ESTIMATOR (The xtreg,be command)

Uses only between or cross-section variation in the data and is the OLS estimator from the regression of \({{\bar{y}}_{i}}\) on \({{\text{x}}_{it}}\).

Because only cross-section variation in the data is used, the coefficient of any individual-invariant  regressor (such as time dummies) cannot be identified.

The between estimator is inconsistent in the FE model but consistent in the RE model.

To explain this, average the individual-effects model Eq(1) to obtain between model;

                \({{\bar{y}}_{i}}={{\alpha }_{i}}+{{\bar{x}}_{i}}\beta +\left( {{\alpha }_{i}}-\alpha +{{{\bar{\varepsilon }}}_{i}} \right)\)                  (6)

The between estimator is the OLS estimator in this model.

Consistency requires that the error term \(\left( {{\alpha }_{i}}-\alpha +{{{\bar{\varepsilon }}}_{i}} \right)\) be uncorrelated with \({{x}_{it}}\)

This is the case if \({{\alpha }_{i}}\) is a random effect but not if \({{\alpha }_{i}}\) is a fixed effect.

To estimate Eq(6) using same variable before, in Command window;

xtreg lwage exp exp2 ed,be

 















The estimates and standard error are closer to those obtained from pooled OLS than from within estimation.