Processing math: 100%

MathType

Friday, 27 October 2017

HAUSMAN-TAYLOR ESTIMATION WITH STATA (PANEL)





The FE and FD estimator provide consistent estimators but not for the coefficient time-invariant regressor (not identified).The FE estimator eliminates anything that is time-invariant from the model.

This is may be high price to pay for allowing the -variables to be correlated with the individual specific heterogeneity  ui .

For example, we may interested in the effect of time-invariant variables (like gender) on a person’s wage.It’s possible to derive IV estimators that can be considered to be in between a FE and RE approach.

Hausman-Taylor (1981) estimator is an IV estimator that enables the coefficients of time-invariant to be estimated.

The key step is to distinguish between regressors uncorrelated with  and those potentially correlated with ui .

The method additionally distinguishes between time-varying and time-invariant regressor.

The individual effects model is then written as;

yit=β0+β1x1it+β2x2it+α1w1it+α2w2it+ui+εit.                       (1)

where; 
x1it=  k1  variables (exogenous) that are time varying and uncorrelated with ui .
x2it =  k2 variables (endogenous) that are time varying and correlated with. ui
w1it =  l1  variables (exogenous) that are time-invariant and uncorrelated with ui
w2it =   l2 variables (endogenous) that are time-invariant and correlated with ui



The assumptions;


E(ui|x1it,w1it)=0 but E(ui|x2it,w2it)0
Var(ui|x1it,x2it,w1it,w2it)=σ2u
Cov(ui,εit|x1it,x2it,w1it,w2it)=0
Corr(ui+εit,ui+εis|x1it,x2it,w1it,w2it)=ρ=σ2u/σ2

The estimation of Eq(1) will make the OLS and GLS not convergent because some variables are correlated with the unobserved effects (random effects).

FE estimator does not allow for estimating α1   and  α2 parameter.RE cannot be used: the correlation among variables associated to β2   and  α2 parameters and the individual effects, ui produce not consistent estimates.

The Hausman-Taylor (H-T) method is based on the RE transformation that leads to the model

˜yit=β1˜x1it+β2˜x2it+α1˜w1it+α2˜w2it+˜ui+˜εit                                               (2)


where, for example ˜x1it=x1itˆθiˉx1i  .


Steps on H-T;

1.       Regress the model by OLS by using differences from the “temporal” mean;

(yitˉyi)=β0+(x1itˉx1i)β1+(x2itˉx2i)β2+(ˉεitˉεi)       (3)


2.       (a) From Step 1, use the residual , ˜ε to compute the “intra-group”  temporal mean of the residuals;

ˉε=Tt=1˜εitT (4)

and stack them into vector ˉe=((Tˉε1,ˉε1,..ˉε1),...(ˉεn,ˉεn,..ˉεn))  

 (b) Do a regression;

                w2it=β0+w1itβ1+x1itβ2+eit            (IV)        (5)

(c) Use the predicted value^w2it  from (b) in the big matrix  W=(W*1,\hat{W}*2)  , where matrices Wk  are formed using the wki   for each group i .

(d) Regress;
               
ˉεit=α0+α1W1it+α2\hat{W}2it+ϑit                          (7)


(e) Note: we just did a 2SLS regression.

3.       From Step 1, estimate  σ2ε from the regression.
From Step 2, estimate σ2u from the RE model, and use the estimate of σ2   from the 2SLS regression.

Since
σ2=σ2u+σ2εT


then an estimate of σu   is


σ2u=σ2σ2εT


4.       We need weights to computed the FGLS.

Let
ˆθ=ˆσ2εˆσ2ε+Tˆσ2u
 

then,  for each group i , let;

                V=[x1it,x2it,w1i,w2i]ˆθ[x1it,x2it,w1i,w2i]                           (8)
                y=yitˆθyit$                                                                                   (9)

                zit=[(x1itx1i),(x2itx2i),w1i,ˉx1i]                    (10)

be the new weighted data and z  the matrix of instruments, then do a 2SLS regression of y   on V  with instruments z :

(a)    Regress V   on z, then generate the predicted values ˆV .

(b)   Regress  y on the predicted values ˆV to get (ˆβ,ˆα)  .

5.       To get the variance of(ˆβ,ˆα) , one should not use the residual of the 2SLS regression, because it is not convergent. See Greene Ch8 eq(8.8).



The steps in H-T suggest to estimate the Eq(2) by instrumental variables using the following variables as a  instruments set in Eq(9): (x1itx1i) , (x2itx2i)    and w1i , ˉx1i ;
                   
(a)    ˜x2it is instrumented by its deviation from individual means, (x2itx2i)
(b)    ˜w2it   is instrumented by the individual average of˜x1it ,(ˉx1it)
(c)    ˜x1itis instrumented by its deviation from individual means, (x1itx1i).
(d)   ˜w1i  is instrumented by w1i .

                                              
The H-T estimator is based on upon an IV estimator which uses both the within and between transformation of the strictly exogenous variables as instruments.

1.       The exogenous variables serve as their own IVs.
2.       The within transformation of the exogenous individual-and-time varying variables serve as IVs for the endogenous individual-and-time varying variables.
3.       Individual means of the exogenous individual-and-time varying variables are used as IVs for endogenous time-invariant regressor.

If the model is identified in the sense that there are at least many time-varying exogenous regressor x1it   as there are individual-time invariant endogenous regressor w2it or k1l2  , then the H-T estimator is more efficient than FE.

If the model is under-identified where k1l2  , then one cannot estimate α1   and α2  , parameters and the H-T estimator of β1   and  β2 are identical to FE.

The resulting estimator of H-T allows us to estimate the effect of time-invariant variables, even though the time-varying regressors are correlated with ui.

The trick is to use the time averages of those time-varying regressors that are uncorrelated with ui as instruments for the time-invariant regressors.

This require that sufficient time-varying variables are included that have no correlation with ui.

The strong advantage of the H-T approach is that one does not have to use external instruments. With sufficient assumptions, instruments can be derived within the model.

 


ESTIMATION WITH STATA


To estimate the H-T model, we use again Paneldata01.dta.

From the data, the wage in log (= lwage) is assumed to be a function of  week worked (=wks), lives in south area (south),lives in metropolitan area (smsa), marital status (ms), year of education (=ed), a quadratic of work experience (=exp, exp2), working in manufacturing (=ind), wage set be a union contract (=union), blue collar (=occ) ,gender for female (=fem) and workers is African American (=blk).

Let we use the xtsum command to show within variability and which variables are time invariant.

xtsum lwage exp exp2 wks ms union occ south smsa ind fem blk ed






We check the correlation between exogenous variables (= south, smsa,ind, occ, fem and blk) and the endogenous time-invariant variable (=ed)

pwcorr south smsa ind occ fem blk ed,star(0.05)

 



The results indicate that although fem appear to be weak instrument, the remaining instruments are probably sufficiently correlated to identified the coefficient on ed.

Weak IVs lead to inconsistent estimates of the endogenous variables because there is not enough information to identify the parameter and cause serious size distortion in any test performed (Stock, Wright & Yogo, 2002).

Lets we check again another correlation between exogenous variables (= south, smsa,ind, occ, fem and blk) and endogenous time-variant variable (=wks, ms,exp,exp2,union)

pwcorr south smsa ind occ fem blk wks ms exp exp2 union,star(0.05)

 



Now, the H-T model we want to estimate will become;

yit=β0+β1x1it+β2x2it+α1w1it+α2w2it+ui+εit                         (11)       

where; 
yit       = lwage
x1it    = south, smsa, ind, occ  .
x2it    = wks, ms, exp, exp2, union 
w1it   = fem, blk 
w2it   = ed 


Before we estimate the H-T, lets we first estimate the FE and RE estimator;

xtreg lwage south smsa ind occ fem blk wks ms exp exp2 union ed, fe

 
 


 
xtreg lwage south smsa ind occ fem blk wks ms exp exp2 union ed, re


 


Now, perform the Hausman test to choose between FE and RE model;

quiet xtreg lwage south smsa ind occ fem blk wks ms exp exp2 union ed, fe
estimates store fe
quiet xtreg lwage south smsa ind occ fem blk wks ms exp exp2 union ed, re
estimates store re
hausman fe re,sigmamore



 

 
The results show that χ2(9), has ρ=0.000.

This leads to strong rejection of the hull hypothesis that RE provide consistent estimates.

That means, the FE model is more preferable.

The problem now, we cannot retrieve the value of coefficients for fem, blk and ed because the FE method drop the variables which is time-invariant.

To  estimate the Eq(11) by H-T estimation with endogenous variables is wks, ms,exp,exp2,union and ed;

xthtaylor lwage south smsa ind occ fem blk wks ms exp exp2 union ed, endog(exp exp2 wks ms union ed)



 



The estimated of σu=0.9418   and σε=0.1518   indicating that a large fraction of the total error variance is attributed to ui .

The z -statistics indicate that several the coefficients may not be significantly different from zero.

The coefficient on time-invariant variables fem and blk have relatively large standard errors whereas coefficient on ed is relatively small.