MathType

Wednesday, 15 November 2017

Dynamic Panel Data : IV and GMM Estimation with Stata (Panel)



Many economic relationship are dynamic in nature and one of the advantages of panel data is that they allow the researcher to better understand the dynamics of adjustment. Some economic model suggest that current behavior depends upon past behavior, so in many cases we would like to estimate a dynamic model on an individual level. The ability to do so is unique for panel data.

The dynamic relationship of panel data is characterized by the presence of a lagged dependent variable among the repressors;

\({{y}_{it}}={{{x}'}_{it}}\beta +\gamma {{y}_{i,t-1}}+{{{x}'}_{it}}\beta +{{\alpha }_{i}}+{{\varepsilon }_{it}}\)                                                (1)

where it is assumed that \({{\varepsilon }_{it}}\)is \(IID\left( 0,\sigma _{\varepsilon }^{2} \right)\) .

The basic problem with the lagged dependent variable included in the model;
1)      For the FE estimator;

\({{y}_{it}}-{{\bar{y}}_{i}}=\gamma \left( {{y}_{i,t-1}}-{{{\bar{y}}}_{i,-1}} \right)+\left( {{x}_{it}}-{{{\bar{x}}}_{i.}} \right)\beta +\left( {{\varepsilon }_{it}}-{{{\bar{\varepsilon }}}_{i}} \right)\)                       (2)

the within transformation wipe out the \({{\alpha }_{i}}\) , but \(\left( {{y}_{i,t-1}}-{{{\bar{y}}}_{i,-1}} \right)\) will be correlated with \(\left( {{\varepsilon }_{it}}-{{{\bar{\varepsilon }}}_{i}} \right)\) even if the \({{\varepsilon }_{it}}\) are not serially correlated. This is because \({{y}_{i,-1}}\) is correlated with \({{\bar{\varepsilon }}_{i}}\), the latter average contains \({{\varepsilon }_{i,t-1}}\), which is obviously correlated with \({{y}_{i,t-1}}\).

2)      For the RE estimator, in order to apply GLS, quasi-demeaning is performed and \(\left( {{y}_{i,t-1}}-\theta {{{\bar{y}}}_{i,-1}} \right)\) will be correlated with \(\left( {{\varepsilon }_{it}}-\theta {{{\bar{\varepsilon }}}_{i}} \right)\).

That means, for a dynamic panel data model, the estimator is biased and inconsistent, whether the effects are treated as fixed or random. This bias is of order \(1/T\) and disappears only if \(T\to \infty\) . The bias can be serious when \(T\) is small and \(N\to \infty \) .

To see why the biased and inconsistent is exist when \(T\) is fixed and \(N\to \infty\) , let we first consider the case where there are no exogenous variables included ;
\({{y}_{it}}={{{x}'}_{it}}\beta +\gamma {{y}_{i,t-1}}+{{\alpha }_{i}}+{{\varepsilon }_{it}}\),   \(|\lambda |<1\)       (3)

Assumed that we have observations on \({{y}_{it}}\) for period \(t=0,1,..T\) .
The FE estimator for \(\gamma\)   ;

\({{\hat{\gamma }}_{FE}}=\frac{\sum\nolimits_{i=1}^{N}{\sum\nolimits_{t=1}^{T}{\left( {{y}_{it}}-{{{\bar{y}}}_{i}} \right)\left( {{y}_{i,t-1}}-{{{\bar{y}}}_{i,-1}} \right)}}}{\sum\nolimits_{i=1}^{N}{\sum\nolimits_{t=1}^{T}{{{\left( {{y}_{i,t-1}}-{{{\bar{y}}}_{i,-1}} \right)}^{2}}}}}\)                (4)

where \({{\bar{y}}_{it}}=\left( 1/T \right)\sum\nolimits_{t=1}^{T}{{{y}_{it}}}\)  and \({{\bar{y}}_{i,-1}}=\left( 1/T \right)\sum\nolimits_{t=1}^{T}{{{y}_{i,t-1}}}\).

The properties of \({{\hat{\gamma }}_{FE}}\)  can be shown by substitute Eq(3)  into Eq(4);

\({{\hat{\gamma }}_{FE}}=\gamma +\frac{\left( 1/\left( NT \right) \right)\sum\nolimits_{i=1}^{N}{\sum\nolimits_{t=1}^{T}{\left( {{\varepsilon }_{it}}-{{{\bar{\varepsilon }}}_{i}} \right)\left( {{y}_{i,t-1}}-{{{\bar{y}}}_{i,-1}} \right)}}}{\left( 1/\left( NT \right) \right)\sum\nolimits_{i=1}^{N}{\sum\nolimits_{t=1}^{T}{{{\left( {{y}_{i,t-1}}-{{{\bar{y}}}_{i,-1}} \right)}^{2}}}}}\)                                (5)

The estimator for FE in Eq(5) is biased and inconsistent for \(N\to \infty \) and fixed \(T\)  because the last term in the right-hand side of Eq(5) does not have expectation zero and does not converge to zero if \(N\to \infty \).  Nickell (1981) and Hsio (2003) state that;

\(\text{pli}{{\text{m}}_{n\to \infty }}\frac{1}{NT}\sum\limits_{i=1}^{N}{\sum\limits_{t=1}^{T}{\left( {{\varepsilon }_{it}}-{{{\bar{\varepsilon }}}_{i}} \right)\left( {{y}_{i,t-1}}-{{{\bar{y}}}_{i,-1}} \right)=-\frac{\sigma _{\varepsilon }^{2}}{{{T}^{2}}}}}\cdot \frac{(T-1)-T\gamma +{{\gamma }^{T}}}{{{(1-\gamma )}^{2}}}\ne 0\)
(6)

Thus, for fixed \(T\) we have inconsistent estimator. This inconsistency is nothing to do with \({{\alpha }_{i}}\) as these is eliminated in estimation. 

The problem is that the within transformed lagged dependent variable is correlated with the within transformed error as we see in Eq(2) and Eq(5). If \(T\to \infty\)  , Eq(6) converge to 0 so that the FE estimator is consistent for \(\gamma\)  if both \(T\to \infty\)  and \(N\to \infty \).

To  solve the inconsistency problem, Anderson and Hsio (1981) proposed the instrumental variable (IV) estimator for the \(\gamma \). Lets we start first with a different transformation to eliminate the individual effects \({{\alpha }_{i}}\) with first differences;

\({{y}_{it}}-{{y}_{i,t-1}}=\gamma \left( {{y}_{i,t-1}}-{{y}_{i,t-2}} \right)+\left( {{\varepsilon }_{it}}-{{\varepsilon }_{i,t-1}} \right)\)     \(t=2,..T\)                (7)

Estimation Eq(7) by OLS will lead inconsistent estimator of \(\gamma\) because \({{y}_{i,t-1}}\) and \({{\varepsilon }_{i,t-1}}\) are correlated, even \(T\to \infty\) .

The transformation specification Eq(7) suggests an IV approach .For example, \({{y}_{i,t-2}}\) is correlated with \(\left( {{y}_{i,t-1}}-{{y}_{i,t-2}} \right)\) but not with \({{\varepsilon }_{i,t-1}}\).
This suggests and IV estimator for  \(\gamma \) by Anderson and Hsio(1981);

\({{\hat{\gamma }}_{IV}}=\frac{\sum\nolimits_{i=1}^{N}{\sum\nolimits_{t=2}^{T}{{{y}_{i,t-2}}\left( {{y}_{it}}-{{y}_{i,t-1}} \right)}}}{\sum\nolimits_{i=1}^{N}{\sum\nolimits_{t=2}^{T}{{{y}_{i,t-2}}\left( {{y}_{i,t-1}}-{{y}_{i,t-2}} \right)}}}\)                          (8)

and the condition for consistency of estimator in Eq(8);

\(\text{plim = }\frac{1}{N(T-1)}\sum\limits_{i=1}^{N}{\sum\limits_{t=2}^{T}{\left( {{\varepsilon }_{it}}-{{\varepsilon }_{i,t-1}} \right)}{{y}_{i,t-2}}=0}\)              (9)

and \(T\to \infty \), or \(N\to \infty \), or both. Note that \(\left( {{\varepsilon }_{it}}-{{\varepsilon }_{i,t-1}} \right)\) is MA(1).

Anderson and Hsio (1981) also proposed an alternative, where \(\left( {{y}_{i,t-2}}-{{y}_{i,t-3}} \right)\) is used as an instrument;

\(\hat{\gamma }_{IV}^{\left( 2 \right)}=\frac{\sum\nolimits_{i=1}^{N}{\sum\nolimits_{t=3}^{T}{\left( {{y}_{i,t-2}}-{{y}_{i,t-3}} \right)\left( {{y}_{it}}-{{y}_{i,t-1}} \right)}}}{\sum\nolimits_{i=1}^{N}{\sum\nolimits_{t=3}^{T}{\left( {{y}_{i,t-2}}-{{y}_{i,t-3}} \right)\left( {{y}_{i,t-1}}-{{y}_{i,t-2}} \right)}}}\) (10)

and the condition for consistency of estimator in Eq(10);
\(\text{plim = }\frac{1}{N(T-2)}\sum\limits_{i=1}^{N}{\sum\limits_{t=1}^{T}{\left( {{\varepsilon }_{it}}-{{\varepsilon }_{i,t-1}} \right)}\left( {{y}_{i,t-2}}-{{y}_{i,t-3}} \right)=0}\)                           (11)

Consistency of both Eq(8) and Eq(11) is guaranteed  as long as \({{\varepsilon }_{it}}\) has no autocorrelation.

We see that in Eq(10) the IV estimator requires an additional lag to construct the instrument and lead ‘lost’ in one sample period.  The question is, which estimator we should use? Eq(8) or Eq(10)? 

This is not an issues as a method of moment (MM) approach can unify the estimators and eliminate the disadvantages of reduced sample size.

The moment condition for Eq(9) become;

\(\text{plim = }\frac{1}{N(T-1)}\sum\limits_{i=1}^{N}{\sum\limits_{t=2}^{T}{\left( {{\varepsilon }_{it}}-{{\varepsilon }_{i,t-1}} \right)}{{y}_{i,t-2}}=E\left\{ \left( {{\varepsilon }_{it}}-{{\varepsilon }_{i,t-1}} \right){{y}_{i,t-2}} \right\}0}\)                (12)

Similary for Eq(11)

\(\text{plim = }\frac{1}{N(T-2)}\sum\limits_{i=1}^{N}{\sum\limits_{t=1}^{T}{\left( {{\varepsilon }_{it}}-{{\varepsilon }_{i,t-1}} \right)}\left( {{y}_{i,t-2}}-{{y}_{i,t-3}} \right)=E\left\{ \left( {{\varepsilon }_{it}}-{{\varepsilon }_{i,t-1}} \right)\left( {{y}_{i,t-2}}-{{y}_{i,t-3}} \right) \right\}=0}\)                               (13)

Both IV estimator impose one moment condition in estimation.
But, as we know, imposing more moment condition increase the efficiency of the estimators.
Follow on this, Arellano and Bond(1991) then suggest that the list of instrument can be extended by exploiting additional moment and letting their number vary with \(t\) . To do this, they keep \(T\) fixed.

Lets \(T=4\) , then the moment condition for period \(t=2\) become;

\(E\left\{ \left( {{\varepsilon }_{i2}}-{{\varepsilon }_{i,1}} \right){{y}_{i0}} \right\}=0\)

which means the variable \({{y}_{i0}}\) is valid instrument, since it is highly correlated with \(\left( {{y}_{i1}}-{{y}_{i0}} \right)\) and not correlated with \(\left( {{\varepsilon }_{i2}}-{{\varepsilon }_{i,1}} \right)\).

For \(t=3\), we have

                \(E\left\{ \left( {{\varepsilon }_{i3}}-{{\varepsilon }_{i,2}} \right){{y}_{i1}} \right\}=0\)
and, we also hold that

                \(E\left\{ \left( {{\varepsilon }_{i3}}-{{\varepsilon }_{i,2}} \right){{y}_{i0}} \right\}=0\)

where \({{y}_{i0}}\)  and \({{y}_{i1}}\) are correlated with \(\left( {{y}_{i2}}-{{y}_{i1}} \right)\)  and not correlated with \(\left( {{\varepsilon }_{i3}}-{{\varepsilon }_{i,2}} \right)\).

And then, for period \(t=4\), we have three moment conditions and there valid instruments;

\(E\left\{ \left( {{\varepsilon }_{i4}}-{{\varepsilon }_{i,3}} \right){{y}_{i0}} \right\}=0\)
            \(E\left\{ \left( {{\varepsilon }_{i4}}-{{\varepsilon }_{i,3}} \right){{y}_{i1}} \right\}=0\)
\(E\left\{ \left( {{\varepsilon }_{i4}}-{{\varepsilon }_{i,3}} \right){{y}_{i2}} \right\}=0\)

One can continue this fashion, the set of valid instruments becomes \(\left( {{y}_{i0}},{{y}_{i2}}...{{y}_{i,T-2}} \right)\).

All these moment conditions can be exploited is a General Method of Moment (GMM) framework. For a general sample size \(T\) , the vector of transformed error terms become;
 
 

and the matrix of instruments;
 

               




Each row in the matrix \({{Z}_{i}}\)  contains the instruments that are valid for given period. Consequently, the set of all moment conditions can be written as

                \(E\left\{ {{{{Z}'}}_{i}}\Delta {{\varepsilon }_{i}} \right\}=0\)                         (16)

To derive the GMM estimator, written Eq(16) as

\(E\left\{ {{{{Z}'}}_{i}}\left( \Delta {{y}_{i}}-\gamma \Delta {{y}_{i,-1}} \right) \right\}=0\)              (17)

Typically, the number of moment condition will exceed the number of unknown parameters, and we estimate \(\gamma \) by minimizing quadratic expression in term of the corresponding sample moments.

\(\underset{\gamma }{\mathop{\min }}\,\text{ }{{\left[ \frac{1}{N}\sum\limits_{i=1}^{N}{{{{{Z}'}}_{i}}}\left( \Delta {{y}_{i}}-\gamma \Delta {{y}_{i,-1}} \right) \right]}^{\prime }}{{W}_{N}}\left[ \frac{1}{N}\sum\limits_{i=1}^{N}{{{{{Z}'}}_{i}}}\left( \Delta {{y}_{i}}-\gamma \Delta {{y}_{i,-1}} \right) \right]\)     (18)

where \({{W}_{N}}\) is a symmetric positive definite weighting matrix. Differentiating Eq(18) with respect to \(\gamma \) and solving for \(\gamma\)  give the GMM estimator;

\({{\hat{\gamma }}_{GMM}}={{\left( \left( \sum\limits_{i=1}^{N}{\Delta {{{{y}'}}_{i,-1}}{{Z}_{i}}} \right){{W}_{N}}\left( \sum\limits_{i=1}^{N}{{{Z}_{i}}^{\prime }\Delta {{y}_{i,-1}}} \right) \right)}^{-1}}\times \left( \sum\limits_{i=1}^{N}{\Delta {{{{y}'}}_{i,-1}}{{Z}_{i}}} \right){{W}_{N}}\left( \sum\limits_{i=1}^{N}{{{Z}_{i}}^{\prime }\Delta {{y}_{i}}} \right)\)                             (19)

The GMM approach does not impose that \({{\varepsilon }_{it}}\) is i.i.d. over individuals and time. Note that the absence of autocorrelation was needed to guarantee that the moment condition is valid. So, it advisable (for a small sample) to impose the absence of autocorrelation in  \({{\varepsilon }_{it}}\), combined with a homoscedasticity assumption.

Alvarez and Arellano (2003) show that, in general, the GMM estimator is also consistent when both \(T\to \infty\)  and \(N\to \infty \). But, for the large \(T\) , the GMM estimator will close to the FE estimator, which provides a more attractive alternative.



Estimation with Stata
For dynamic panel estimation , we use the abdata.dta.

The aim is to estimate a model for employment in a panel of companies in UK and the model estimated is based on Arellano-Bond (1991);

\({{n}_{it}}={{\alpha }_{1}}{{n}_{i,t-1}}+{{\alpha }_{2}}{{n}_{i,t-2}}+{{\beta }_{1}}{{w}_{it}}+{{\beta }_{2}}{{w}_{i,t-1}}+{{\beta }_{3}}{{k}_{it}}+{{\beta }_{4}}{{k}_{i,t-1}}+{{\beta }_{5}}{{k}_{i,t-2}}+{{\beta }_{6}}y{{s}_{it}}+{{\beta }_{7}}y{{s}_{i,t-1}}+{{\beta }_{8}}y{{s}_{i,t-2}}+{{\lambda }_{t}}+{{u}_{i}}+{{\varepsilon }_{it}}\)                                 (20)
Where;

\(n\)      = log of employee
\(w\)     = log of per-employee real wage
\(k\)      = log of gross capital stock
\(ys\)    = log of output of each industry
\({{\lambda }_{t}}\)= time effect
\({{u}_{i}}\)= time-invariant unobservable
\({{\varepsilon }_{it}}\) = idiosyncratic error

and the \(T=7\) and \(N=140\)
 
Lets now we estimate the model Eq(20) with the IV estimation base on the Anderson & Hsio (1981). We will use the  xtvireg command estimation;

xtivreg n (L(1/2).n L(0/1).w L(0/2).(k ys) yr1980-yr1984 = L(2/3).n L(0/1).w L(0/2).(k ys) yr1980-yr1984),fd nocons

 



Now, lets we estimate the Eq(20) again but with the Arellano and Bond(1991) method, or GMM method.

xtabond n L(0/1).w L(0/2).(k ys) yr1979-yr1984 year, lags(2) vce(robust)


 

 

It is important to test H0 : error not correlated at the second order i.e. dynamics correctly specified. Of  course, H0 : error not correlated at the first order is always rejected because in the first difference equation errors – MA(1).

To perform the Arellano-Bond test for first- and second-order autocorrelation in the first-difference errors;

estat abond

 


The results  for Arellano-Bond test shows that our estimation does not present evidence that the model is misspecified (no autocorrelation at second order). 

Beside the xtabond,we also can use the user-written xtabond2 command by Roodman (2009) to get the exactly same results.

xtabond2 n L(1/2).n L(0/1).w L(0/2).(k ys) yr1979-yr1984,gmm(n,laglimits(2 .)) iv(L(0/1).w L(0/2).(k ys) yr1979-yr1984) noleveleq
 
 

The results from the xtbond2   shows that our estimation does not present evidence that the model is misspecified (no autocorrelation at second order). 

For the Sargan test for overidentifying, only for homokedastic error term does the Sargan test have an asymptotic chi-squared distribution. In fact, Arellano and bond (1991) show that one-step Sargan test overrejects in the presence of heteroskedasticity.The results above presents strong evidence against the null hypothesis that the overidentifying restriction are valid, or the population moment condition are correct. Rejecting this null hypothesis implies that we need to reconsider our model or our instruments.

Some consideration need to take note. 

1.       First, all the explanatory variables different from the lagged dependent variable are assumed to be strictly exogenous, i.e. \(E\left( {{x}_{it}},{{\varepsilon }_{is}} \right)=0\) , \(\forall t,s=1,...,T,\forall i=1,..N.\)
2.       The gmm option specifies a set of variables to be used as bases for “GMM-style” instrument sets describe in Holtz-Eakin, Newey and Rosen (1988) and Arellano and Bond (1991).
3.       By default, xtbond2 uses for each time period, all variable lags of the specified variables in levels dated \(t-1\) or earlier as instruments for the first difference equation, and the contemporaneous first differences as instruments in the level equations.
4.       The suboption laglimit(a b)can override these default. For the first-difference equations, lagged levels dated \(t-a\) to \(t-b\) are used as instruments. For the level equation, the first-difference dated \(t+a+1\) is normally used. Note that \(a\) and \(b\) can each be missing  intending to infinity; they can even be negative, implying “forward” lags (Areallano and Bover, 1995).
5.       There are different ways of writing  gmm for eq(diff). For example, the use of \({{y}_{i,t-2}}\) and its lags as IVs for \({{y}_{i,t-1}}\) in equation in first-differences can be written as : gmm(y,laglimits(2 .)),or gmm(L2.y,laglimits(0 .)),or gmm(L.y,laglimits(1 .))or the default gmm(L.y).