Many economic relationship are
dynamic in nature and one of the advantages of panel data is that they allow
the researcher to better understand the dynamics of adjustment. Some economic
model suggest that current behavior depends upon past behavior, so in many
cases we would like to estimate a dynamic model on an individual level. The
ability to do so is unique for panel data.
The dynamic relationship of panel data is characterized by
the presence of a lagged dependent variable among the repressors;
\({{y}_{it}}={{{x}'}_{it}}\beta
+\gamma {{y}_{i,t-1}}+{{{x}'}_{it}}\beta +{{\alpha }_{i}}+{{\varepsilon
}_{it}}\) (1)
where it is assumed that \({{\varepsilon }_{it}}\)is \(IID\left( 0,\sigma _{\varepsilon
}^{2} \right)\) .
The basic problem with the lagged dependent variable
included in the model;
1) For
the FE estimator;
\({{y}_{it}}-{{\bar{y}}_{i}}=\gamma \left(
{{y}_{i,t-1}}-{{{\bar{y}}}_{i,-1}} \right)+\left( {{x}_{it}}-{{{\bar{x}}}_{i.}}
\right)\beta +\left( {{\varepsilon }_{it}}-{{{\bar{\varepsilon }}}_{i}}
\right)\) (2)
the within transformation wipe out the \({{\alpha }_{i}}\) , but
\(\left(
{{y}_{i,t-1}}-{{{\bar{y}}}_{i,-1}} \right)\) will be correlated with \(\left( {{\varepsilon
}_{it}}-{{{\bar{\varepsilon }}}_{i}} \right)\) even if the \({{\varepsilon }_{it}}\)
are not serially correlated. This is because \({{y}_{i,-1}}\) is correlated with \({{\bar{\varepsilon }}_{i}}\),
the latter average contains \({{\varepsilon
}_{i,t-1}}\), which is obviously correlated with \({{y}_{i,t-1}}\).
2) For
the RE estimator, in order to apply GLS, quasi-demeaning is performed and \(\left( {{y}_{i,t-1}}-\theta
{{{\bar{y}}}_{i,-1}} \right)\) will be correlated with \(\left( {{\varepsilon
}_{it}}-\theta {{{\bar{\varepsilon }}}_{i}} \right)\).
That means, for a dynamic panel data model, the estimator is
biased and inconsistent, whether the effects are treated as fixed or random.
This bias is of order \(1/T\)
and disappears only if \(T\to
\infty\) . The bias can be serious when \(T\) is small and \(N\to \infty \) .
To see why the biased and inconsistent is exist when \(T\) is fixed and \(N\to \infty\) , let we
first consider the case where there are no exogenous variables included ;
\({{y}_{it}}={{{x}'}_{it}}\beta +\gamma
{{y}_{i,t-1}}+{{\alpha }_{i}}+{{\varepsilon }_{it}}\), \(|\lambda |<1\) (3)
Assumed that we have observations on \({{y}_{it}}\) for period \(t=0,1,..T\) .
The FE estimator for \(\gamma\) ;
\({{\hat{\gamma }}_{FE}}=\frac{\sum\nolimits_{i=1}^{N}{\sum\nolimits_{t=1}^{T}{\left(
{{y}_{it}}-{{{\bar{y}}}_{i}} \right)\left( {{y}_{i,t-1}}-{{{\bar{y}}}_{i,-1}}
\right)}}}{\sum\nolimits_{i=1}^{N}{\sum\nolimits_{t=1}^{T}{{{\left(
{{y}_{i,t-1}}-{{{\bar{y}}}_{i,-1}} \right)}^{2}}}}}\) (4)
where \({{\bar{y}}_{it}}=\left(
1/T \right)\sum\nolimits_{t=1}^{T}{{{y}_{it}}}\) and \({{\bar{y}}_{i,-1}}=\left( 1/T
\right)\sum\nolimits_{t=1}^{T}{{{y}_{i,t-1}}}\).
The properties of \({{\hat{\gamma }}_{FE}}\)
can be shown by substitute Eq(3)
into Eq(4);
\({{\hat{\gamma }}_{FE}}=\gamma +\frac{\left( 1/\left( NT
\right) \right)\sum\nolimits_{i=1}^{N}{\sum\nolimits_{t=1}^{T}{\left(
{{\varepsilon }_{it}}-{{{\bar{\varepsilon }}}_{i}} \right)\left(
{{y}_{i,t-1}}-{{{\bar{y}}}_{i,-1}} \right)}}}{\left( 1/\left( NT \right)
\right)\sum\nolimits_{i=1}^{N}{\sum\nolimits_{t=1}^{T}{{{\left(
{{y}_{i,t-1}}-{{{\bar{y}}}_{i,-1}} \right)}^{2}}}}}\) (5)
The estimator for FE in Eq(5) is biased and inconsistent for
\(N\to \infty \)
and fixed \(T\) because the last term in the right-hand side
of Eq(5) does not have expectation zero and does not converge to zero if \(N\to \infty \). Nickell (1981) and Hsio (2003) state that;
\(\text{pli}{{\text{m}}_{n\to
\infty }}\frac{1}{NT}\sum\limits_{i=1}^{N}{\sum\limits_{t=1}^{T}{\left(
{{\varepsilon }_{it}}-{{{\bar{\varepsilon }}}_{i}} \right)\left(
{{y}_{i,t-1}}-{{{\bar{y}}}_{i,-1}} \right)=-\frac{\sigma _{\varepsilon
}^{2}}{{{T}^{2}}}}}\cdot \frac{(T-1)-T\gamma +{{\gamma }^{T}}}{{{(1-\gamma
)}^{2}}}\ne 0\)
(6)
Thus, for fixed \(T\) we have inconsistent estimator. This inconsistency is
nothing to do with \({{\alpha
}_{i}}\) as these is eliminated in estimation.
The problem is that the within transformed lagged dependent
variable is correlated with the within transformed error as we see in Eq(2) and
Eq(5). If \(T\to \infty\) , Eq(6) converge to 0 so that the FE estimator
is consistent for \(\gamma\)
if both \(T\to \infty\) and \(N\to \infty \).
To solve the
inconsistency problem, Anderson and Hsio (1981) proposed the instrumental
variable (IV) estimator for the \(\gamma \). Lets we start first with a different transformation
to eliminate the individual effects \({{\alpha }_{i}}\) with first differences;
\({{y}_{it}}-{{y}_{i,t-1}}=\gamma \left(
{{y}_{i,t-1}}-{{y}_{i,t-2}} \right)+\left( {{\varepsilon }_{it}}-{{\varepsilon
}_{i,t-1}} \right)\) \(t=2,..T\) (7)
Estimation Eq(7) by OLS will lead inconsistent estimator of \(\gamma\) because \({{y}_{i,t-1}}\) and \({{\varepsilon }_{i,t-1}}\)
are correlated, even \(T\to
\infty\) .
The transformation specification Eq(7) suggests an IV approach
.For example, \({{y}_{i,t-2}}\)
is correlated with \(\left(
{{y}_{i,t-1}}-{{y}_{i,t-2}} \right)\) but not with \({{\varepsilon }_{i,t-1}}\).
This suggests and IV estimator for \(\gamma \) by Anderson and Hsio(1981);
\({{\hat{\gamma
}}_{IV}}=\frac{\sum\nolimits_{i=1}^{N}{\sum\nolimits_{t=2}^{T}{{{y}_{i,t-2}}\left(
{{y}_{it}}-{{y}_{i,t-1}} \right)}}}{\sum\nolimits_{i=1}^{N}{\sum\nolimits_{t=2}^{T}{{{y}_{i,t-2}}\left(
{{y}_{i,t-1}}-{{y}_{i,t-2}} \right)}}}\) (8)
and the condition for consistency of estimator in Eq(8);
\(\text{plim =
}\frac{1}{N(T-1)}\sum\limits_{i=1}^{N}{\sum\limits_{t=2}^{T}{\left(
{{\varepsilon }_{it}}-{{\varepsilon }_{i,t-1}} \right)}{{y}_{i,t-2}}=0}\) (9)
and \(T\to
\infty \), or \(N\to
\infty \), or both. Note that \(\left( {{\varepsilon }_{it}}-{{\varepsilon }_{i,t-1}} \right)\)
is MA(1).
Anderson and Hsio (1981) also proposed an alternative, where
\(\left(
{{y}_{i,t-2}}-{{y}_{i,t-3}} \right)\) is used as an instrument;
\(\hat{\gamma }_{IV}^{\left( 2
\right)}=\frac{\sum\nolimits_{i=1}^{N}{\sum\nolimits_{t=3}^{T}{\left(
{{y}_{i,t-2}}-{{y}_{i,t-3}} \right)\left( {{y}_{it}}-{{y}_{i,t-1}} \right)}}}{\sum\nolimits_{i=1}^{N}{\sum\nolimits_{t=3}^{T}{\left(
{{y}_{i,t-2}}-{{y}_{i,t-3}} \right)\left( {{y}_{i,t-1}}-{{y}_{i,t-2}}
\right)}}}\) (10)
and the condition for consistency of estimator in Eq(10);
\(\text{plim = }\frac{1}{N(T-2)}\sum\limits_{i=1}^{N}{\sum\limits_{t=1}^{T}{\left(
{{\varepsilon }_{it}}-{{\varepsilon }_{i,t-1}} \right)}\left(
{{y}_{i,t-2}}-{{y}_{i,t-3}} \right)=0}\) (11)
Consistency of both Eq(8) and Eq(11) is guaranteed as long as \({{\varepsilon }_{it}}\) has no autocorrelation.
We see that in Eq(10) the IV estimator requires an
additional lag to construct the instrument and lead ‘lost’ in one sample
period. The question is, which estimator
we should use? Eq(8) or Eq(10)?
This is not an issues as a method of moment (MM) approach
can unify the estimators and eliminate the disadvantages of reduced sample
size.
The moment condition for Eq(9) become;
\(\text{plim =
}\frac{1}{N(T-1)}\sum\limits_{i=1}^{N}{\sum\limits_{t=2}^{T}{\left(
{{\varepsilon }_{it}}-{{\varepsilon }_{i,t-1}} \right)}{{y}_{i,t-2}}=E\left\{
\left( {{\varepsilon }_{it}}-{{\varepsilon }_{i,t-1}} \right){{y}_{i,t-2}}
\right\}0}\) (12)
Similary for Eq(11)
\(\text{plim =
}\frac{1}{N(T-2)}\sum\limits_{i=1}^{N}{\sum\limits_{t=1}^{T}{\left( {{\varepsilon
}_{it}}-{{\varepsilon }_{i,t-1}} \right)}\left( {{y}_{i,t-2}}-{{y}_{i,t-3}}
\right)=E\left\{ \left( {{\varepsilon }_{it}}-{{\varepsilon }_{i,t-1}}
\right)\left( {{y}_{i,t-2}}-{{y}_{i,t-3}} \right) \right\}=0}\)
(13)
Both IV estimator impose one moment condition in estimation.
But, as we know, imposing more moment condition increase the
efficiency of the estimators.
Follow on this, Arellano and Bond(1991) then suggest that
the list of instrument can be extended by exploiting additional moment and
letting their number vary with \(t\) . To do this, they keep \(T\) fixed.
Lets \(T=4\)
, then the moment condition for period \(t=2\) become;
\(E\left\{ \left( {{\varepsilon }_{i2}}-{{\varepsilon
}_{i,1}} \right){{y}_{i0}} \right\}=0\)
which means the variable \({{y}_{i0}}\) is valid instrument, since it is
highly correlated with \(\left(
{{y}_{i1}}-{{y}_{i0}} \right)\) and not correlated with \(\left( {{\varepsilon }_{i2}}-{{\varepsilon
}_{i,1}} \right)\).
For \(t=3\),
we have
\(E\left\{ \left( {{\varepsilon
}_{i3}}-{{\varepsilon }_{i,2}} \right){{y}_{i1}} \right\}=0\)
and, we also hold that
\(E\left\{ \left( {{\varepsilon
}_{i3}}-{{\varepsilon }_{i,2}} \right){{y}_{i0}} \right\}=0\)
where \({{y}_{i0}}\) and \({{y}_{i1}}\) are correlated with \(\left( {{y}_{i2}}-{{y}_{i1}}
\right)\) and not correlated with
\(\left( {{\varepsilon
}_{i3}}-{{\varepsilon }_{i,2}} \right)\).
And then, for period \(t=4\), we have three moment conditions and there
valid instruments;
\(E\left\{ \left( {{\varepsilon }_{i4}}-{{\varepsilon
}_{i,3}} \right){{y}_{i0}} \right\}=0\)
\(E\left\{ \left( {{\varepsilon
}_{i4}}-{{\varepsilon }_{i,3}} \right){{y}_{i1}} \right\}=0\)
\(E\left\{ \left( {{\varepsilon }_{i4}}-{{\varepsilon
}_{i,3}} \right){{y}_{i2}} \right\}=0\)
One can continue this fashion, the set of valid instruments
becomes \(\left(
{{y}_{i0}},{{y}_{i2}}...{{y}_{i,T-2}} \right)\).
All these moment conditions can be exploited is a General
Method of Moment (GMM) framework. For a general sample size \(T\) , the vector of
transformed error terms become;
and the matrix of instruments;
Each row in the matrix \({{Z}_{i}}\)
contains the instruments that are
valid for given period. Consequently, the set of all moment conditions can be
written as
\(E\left\{ {{{{Z}'}}_{i}}\Delta
{{\varepsilon }_{i}} \right\}=0\) (16)
To derive the GMM estimator, written Eq(16) as
\(E\left\{ {{{{Z}'}}_{i}}\left( \Delta {{y}_{i}}-\gamma
\Delta {{y}_{i,-1}} \right) \right\}=0\)
(17)
Typically, the number of moment condition will exceed the
number of unknown parameters, and we estimate \(\gamma \) by minimizing quadratic expression in
term of the corresponding sample moments.
\(\underset{\gamma }{\mathop{\min }}\,\text{ }{{\left[
\frac{1}{N}\sum\limits_{i=1}^{N}{{{{{Z}'}}_{i}}}\left( \Delta {{y}_{i}}-\gamma
\Delta {{y}_{i,-1}} \right) \right]}^{\prime }}{{W}_{N}}\left[
\frac{1}{N}\sum\limits_{i=1}^{N}{{{{{Z}'}}_{i}}}\left( \Delta {{y}_{i}}-\gamma
\Delta {{y}_{i,-1}} \right) \right]\) (18)
where \({{W}_{N}}\) is a symmetric positive
definite weighting matrix. Differentiating Eq(18) with respect to \(\gamma \) and
solving for \(\gamma\) give the GMM estimator;
\({{\hat{\gamma }}_{GMM}}={{\left( \left(
\sum\limits_{i=1}^{N}{\Delta {{{{y}'}}_{i,-1}}{{Z}_{i}}} \right){{W}_{N}}\left(
\sum\limits_{i=1}^{N}{{{Z}_{i}}^{\prime }\Delta {{y}_{i,-1}}} \right)
\right)}^{-1}}\times \left( \sum\limits_{i=1}^{N}{\Delta
{{{{y}'}}_{i,-1}}{{Z}_{i}}} \right){{W}_{N}}\left(
\sum\limits_{i=1}^{N}{{{Z}_{i}}^{\prime }\Delta {{y}_{i}}} \right)\)
(19)
The GMM approach does not impose that \({{\varepsilon }_{it}}\)
is i.i.d. over individuals and time. Note that the absence of autocorrelation
was needed to guarantee that the moment condition is valid. So, it advisable
(for a small sample) to impose the absence of autocorrelation in \({{\varepsilon }_{it}}\), combined with a homoscedasticity
assumption.
Alvarez and Arellano (2003) show that, in general, the GMM
estimator is also consistent when both \(T\to \infty\) and \(N\to \infty \). But, for the large \(T\) , the GMM estimator will close to the
FE estimator, which provides a more attractive alternative.
Estimation with Stata
For dynamic
panel estimation , we use the abdata.dta.
The
aim is to estimate a model for employment in a panel of companies in UK and the
model estimated is based on Arellano-Bond (1991);
\({{n}_{it}}={{\alpha
}_{1}}{{n}_{i,t-1}}+{{\alpha }_{2}}{{n}_{i,t-2}}+{{\beta
}_{1}}{{w}_{it}}+{{\beta }_{2}}{{w}_{i,t-1}}+{{\beta }_{3}}{{k}_{it}}+{{\beta
}_{4}}{{k}_{i,t-1}}+{{\beta }_{5}}{{k}_{i,t-2}}+{{\beta
}_{6}}y{{s}_{it}}+{{\beta }_{7}}y{{s}_{i,t-1}}+{{\beta
}_{8}}y{{s}_{i,t-2}}+{{\lambda }_{t}}+{{u}_{i}}+{{\varepsilon }_{it}}\) (20)
Where;
\(n\)
= log of employee
\(w\) = log of per-employee real wage
\(k\) = log of gross capital stock
\(ys\) = log of output of each industry
\({{\lambda
}_{t}}\)= time effect
\({{u}_{i}}\)=
time-invariant unobservable
\({{\varepsilon
}_{it}}\) = idiosyncratic error
and the \(T=7\)
and \(N=140\)
Lets now we estimate the model Eq(20) with the IV estimation
base on the Anderson & Hsio (1981). We will use the xtvireg command estimation;
xtivreg n (L(1/2).n L(0/1).w L(0/2).(k ys)
yr1980-yr1984 = L(2/3).n L(0/1).w L(0/2).(k ys) yr1980-yr1984),fd nocons
Now, lets
we estimate the Eq(20) again but with the Arellano and Bond(1991) method, or
GMM method.
xtabond n L(0/1).w L(0/2).(k ys)
yr1979-yr1984 year, lags(2) vce(robust)
It is
important to test H0 : error not correlated at the second order i.e.
dynamics correctly specified. Of course,
H0 : error not correlated at the first order is always rejected
because in the first difference equation errors – MA(1).
To perform the Arellano-Bond test for first-
and second-order autocorrelation in the first-difference errors;
estat
abond
The
results for Arellano-Bond test shows
that our estimation does not present evidence that the model is misspecified
(no autocorrelation at second order).
Beside the xtabond,we also can use the user-written xtabond2 command by Roodman (2009) to get the exactly same results.
xtabond2 n L(1/2).n L(0/1).w L(0/2).(k ys)
yr1979-yr1984,gmm(n,laglimits(2 .)) iv(L(0/1).w L(0/2).(k ys) yr1979-yr1984)
noleveleq
The results
from the xtbond2 shows that our estimation
does not present evidence that the model is misspecified (no autocorrelation at
second order).
For the
Sargan test for overidentifying, only
for homokedastic error term does the Sargan test have an asymptotic chi-squared
distribution. In fact, Arellano and bond (1991) show that one-step Sargan test
overrejects in the presence of heteroskedasticity.The results above presents
strong evidence against the null hypothesis that the overidentifying restriction
are valid, or the population moment condition are correct. Rejecting this null
hypothesis implies that we need to reconsider our model or our instruments.
Some
consideration need to take note.
1. First, all the explanatory variables
different from the lagged dependent variable are assumed to be strictly
exogenous, i.e. \(E\left(
{{x}_{it}},{{\varepsilon }_{is}} \right)=0\) , \(\forall t,s=1,...,T,\forall i=1,..N.\)
2. The gmm option specifies a set of variables
to be used as bases for “GMM-style” instrument sets describe in Holtz-Eakin,
Newey and Rosen (1988) and Arellano and Bond (1991).
3. By default, xtbond2
uses for each time
period, all variable lags of the specified variables in levels dated \(t-1\) or earlier as
instruments for the first difference equation, and the contemporaneous first
differences as instruments in the level equations.
4. The suboption laglimit(a
b)can override these default. For the
first-difference equations, lagged levels dated \(t-a\) to \(t-b\) are used as instruments. For the level
equation, the first-difference dated \(t+a+1\) is normally used. Note that \(a\) and \(b\) can each be missing
intending to
infinity; they can even be negative, implying “forward” lags (Areallano and
Bover, 1995).
5. There are different ways of writing gmm for eq(diff). For example, the use of \({{y}_{i,t-2}}\) and its lags as IVs for \({{y}_{i,t-1}}\) in
equation in first-differences can be written as : gmm(y,laglimits(2 .)),or
gmm(L2.y,laglimits(0 .)),or gmm(L.y,laglimits(1 .))or the default gmm(L.y).