MathType

Monday, 26 December 2016

Instrumental Variables (IV) Estimation with Stata



The instrumental variables (IV) is use in econometric analysis to solve the problem of endogeneity of one or more explanatory variables in our model. The IV method can be used to obtain consistent estimators in the presence the omitted variables; the variables that are correlated with the explanatory variables.

If our model the explanatory variable(s) are contemporaneously correlated with the error term, the OLS estimator is even asymptotically biased. This is because the OLS procedure , in assigning  “credit” to explanatory variable(s) for explaining variation in the dependent variable, its also assign, in error, some of the disturbance-generated variation of the  dependent variable to the explanatory variable(s) with which that disturbance is contemporaneously correlated.

Consider as an example the case in which the correlation between the explanatory variable(s) and the disturbance is positive. When the disturbance is higher the dependent variable is higher, and because the existing correlation between the disturbance and the explanatory variable, the explanatory variable is likely to be higher.This implying that too much credit for making the dependent variable higher is likely to be assigned to the explanatory variable.

 


This is illustrated in figure where if the error term and the independent (explanatory) variable are positively correlated, negative values of the disturbance will tend  to correspond  to low values of the independent variable and positive value of the disturbance will tend to correspond to high values of the independent variable, creating data patterns similar to that shown in the diagram. The OLS estimating line clearly overestimate the slope of the true relationship.

The IV procedure produce a consistent estimator in a situation in which a explanatory variable(s) is contemporaneously correlated with the error. To use the IV estimator one must first find an “instrument” for each explanatory variable(s) that is contemporaneously correlated with the error. That’s mean we must search the new variable which is fit to become as an instruments for explanatory variable(s) in our model.

Let our simple regression model is written as;

\(y={{\beta }_{0}}+{{\beta }_{1}}x+u\)                                   (1)

where we think that \(x\) and \(u\) are correlated;
\(Cov\left( x,u \right)\ne 0\)                                                      (2)

The method of IV works whether or not \(x\) and \(u\) are correlated. The method of OLS should be used if \(x\) is uncorrelated with \(u\) .

In order to get consistent estimator of \({{\beta }_{0}}\) and \({{\beta }_{1}}\) when \(x\) and \(u\) are correlated, we need some additional information. The information come by way a new variable that satisfies certain properties.

Suppose that we have an observable variable \(z\) that satisfies these two assumptions;

(1) \(z\) is uncorrelated with \(u\) , that is,
\(Cov\left( z,u \right)=0\)                                                            (3)


(2) \(z\) is correlated with \(x\) , that is,
\(Cov\left( z,x \right)\ne 0\)                                                       (4)


Then, we call \(z\) an IV for \(x\) , or we call as an instrument for \(x\) .

The requirement that the instrument \(z\) satisfies Eq(3) is summarized with “\(z\) is exogenous in Eq(1)”, and we can refer Eq(3) as instruments exogeneity.

In the context of omitted variables, instrument exogeneity means that \(z\) should have no partial effect on \(y\)(after \(x\) and omitted variables have been controlled for), and \(z\) should be uncorrelated with the omitted variables. Eq(4) means that \(z\) must be related (positive or negative) to the endogenous explanatory variable \(x\).

Because Eq(3) involves the covariance between \(z\) and \(u\), we cannot generally hope to test this assumption based on Eq(3), and we must maintain that Eq(3) is hold by appealing to economic behaviour or introspection.

But, for the Eq(4) which is the condition that \(z\) is correlated with \(x\) (in the population), it can be tested, given a random sample from the population. The essiest way to do this is to estimate a simple regression between \(x\) and \(z\) , or we call it as reduced form;

\(x={{\pi }_{0}}+{{\pi }_{1}}z+v\)                               (5)

Then, because \({{\pi }_{1}}=Cov\left( z,x \right)/Var\left( z \right)\) , assumption Eq(4) hold if, and only if, \({{\pi }_{1}}\ne 0\) . Thus, we should be able to reject the null hypothesis;

\({{H}_{0}}:{{\pi }_{1}}=0\)                                           (6)

against  the two sided hypothesis \({{H}_{a}}:{{\pi }_{1}}\ne 0\) . If this happen, mean that the we are much confident that Eq(4) is holds.

For our discussion on the IV using Stata, lets we use the data mroz.dta  and we want to estimate the return to education in the simple regression model.

\(\log \left( wage \right)={{\beta }_{0}}+{{\beta }_{1}}educ+u\)                                (7)

We need to generate the variable  into log form;

gen lwage = log(wage)

For comparison, lets we first obtain the OLS estimates for Eq(7);

reg lwage educ

 


The estimate for \({{\beta }_{1}}\) implies an almost 11% return for another year of education.

Lets we assume that there is endogeneity problem in Eq(7) which is \(Cov\left( edu,u \right)\ne 0\). That’s mean we need an instrument for the \(educ\) . Lets we use father’s education \(\left( fatheduc \right)\) as an IV for \(educ\) and we have to maintain that \(Cov\left( fatheduc,u \right)=0\).

The second requirement is that \(educ\) and \(fatheduc\) are correlated by using simple regression of \(educ\) on \(fatheduc\), or the reduced form;

\(educ={{\pi }_{0}}+{{\pi }_{1}}fatheduc+v\)                       (8)

and the results for reduced form from Stata;

reg educ fatheduc
 


The  \(t\)-statistic on \(fatheduc\) is 9.43 which indicate that \(educ\) and \(fatheduc\) have a statistically significant positive correlation.

Then, using \(fatheduc\) as an IV for \(educ\) , the IV estimation using Stata gives,

ivreg lwage (educ=fatheduc)


 

The IV estimate of the return of education is 5.9%, which is barely more than one-half of the OLS estimate. This suggest that the OLS estimate is too high and is consistent with omitted ability bias.