The instrumental variables (IV) is use in econometric
analysis to solve the problem of endogeneity of one or more explanatory
variables in our model. The IV method can be used to obtain consistent estimators
in the presence the omitted variables; the variables that are correlated with
the explanatory variables.
If our model the explanatory variable(s) are
contemporaneously correlated with the error term, the OLS estimator is even
asymptotically biased. This is because the OLS procedure , in assigning “credit” to explanatory variable(s) for
explaining variation in the dependent variable, its also assign, in error, some
of the disturbance-generated variation of the
dependent variable to the explanatory variable(s) with which that
disturbance is contemporaneously correlated.
Consider as an
example the case in which the correlation between the explanatory variable(s) and
the disturbance is positive. When the disturbance is higher the dependent
variable is higher, and because the existing correlation between the
disturbance and the explanatory variable, the explanatory variable is likely to
be higher.This implying that too much credit for making the dependent variable
higher is likely to be assigned to the explanatory variable.
This is illustrated
in figure where if the error term and the independent (explanatory) variable
are positively correlated, negative values of the disturbance will tend to correspond
to low values of the independent variable and positive value of the
disturbance will tend to correspond to high values of the independent variable,
creating data patterns similar to that shown in the diagram. The OLS estimating
line clearly overestimate the slope of the true relationship.
The IV
procedure produce a consistent estimator in a situation in which a explanatory
variable(s) is contemporaneously correlated with the error. To use the IV
estimator one must first find an “instrument” for each explanatory variable(s)
that is contemporaneously correlated with the error. That’s mean we must search
the new variable which is fit to become as an instruments for explanatory
variable(s) in our model.
Let our
simple regression model is written as;
\(y={{\beta }_{0}}+{{\beta }_{1}}x+u\) (1)
where we
think that
\(x\) and
\(u\) are correlated;
\(Cov\left( x,u \right)\ne 0\) (2)
The method
of IV works whether or not \(x\) and \(u\) are correlated. The method of OLS
should be used if \(x\) is uncorrelated with \(u\) .
In order to
get consistent estimator of \({{\beta }_{0}}\) and \({{\beta }_{1}}\) when \(x\)
and \(u\) are correlated, we need some additional information. The information
come by way a new variable that satisfies certain properties.
Suppose
that we have an observable variable \(z\) that satisfies these two assumptions;
(1) \(z\)
is uncorrelated with \(u\) , that is,
\(Cov\left( z,u \right)=0\) (3)
(2) \(z\)
is correlated with \(x\) , that is,
\(Cov\left( z,x \right)\ne 0\) (4)
Then, we
call \(z\) an IV for \(x\) , or we call as an instrument for \(x\) .
The
requirement that the instrument \(z\) satisfies Eq(3) is summarized with “\(z\)
is exogenous in Eq(1)”, and we can refer Eq(3) as instruments exogeneity.
In the
context of omitted variables, instrument exogeneity means that \(z\) should
have no partial effect on \(y\)(after \(x\) and omitted variables have been
controlled for), and \(z\) should be uncorrelated with the omitted variables. Eq(4)
means that \(z\) must be related (positive or negative) to the endogenous
explanatory variable \(x\).
Because
Eq(3) involves the covariance between \(z\) and \(u\), we cannot generally hope
to test this assumption based on Eq(3), and we must maintain that Eq(3) is hold
by appealing to economic behaviour or introspection.
But, for
the Eq(4) which is the condition that \(z\) is correlated with \(x\) (in the
population), it can be tested, given a random sample from the population. The
essiest way to do this is to estimate a simple regression between \(x\) and \(z\)
, or we call it as reduced form;
\(x={{\pi
}_{0}}+{{\pi }_{1}}z+v\) (5)
Then,
because \({{\pi }_{1}}=Cov\left( z,x \right)/Var\left( z \right)\) , assumption
Eq(4) hold if, and only if, \({{\pi }_{1}}\ne 0\) . Thus, we should be able to
reject the null hypothesis;
\({{H}_{0}}:{{\pi
}_{1}}=0\) (6)
against the two sided hypothesis \({{H}_{a}}:{{\pi
}_{1}}\ne 0\) . If this happen, mean that the we are much confident that Eq(4)
is holds.
For our discussion
on the IV using Stata, lets we use the data mroz.dta and we want to estimate the return to
education in the simple regression model.
\(\log
\left( wage \right)={{\beta }_{0}}+{{\beta }_{1}}educ+u\) (7)
We need to
generate the variable
into log form;
gen lwage =
log(wage)
For
comparison, lets we first obtain the OLS estimates for Eq(7);
reg lwage educ
The estimate
for \({{\beta }_{1}}\) implies an almost 11% return for another year of
education.
Lets we
assume that there is endogeneity problem in Eq(7) which is \(Cov\left( edu,u
\right)\ne 0\). That’s mean we need an instrument for the \(educ\) . Lets we
use father’s education \(\left( fatheduc \right)\) as an IV for \(educ\) and we
have to maintain that \(Cov\left( fatheduc,u \right)=0\).
The second
requirement is that \(educ\) and \(fatheduc\) are correlated by using
simple regression of \(educ\) on \(fatheduc\), or the reduced form;
\(educ={{\pi
}_{0}}+{{\pi }_{1}}fatheduc+v\) (8)
and the
results for reduced form from Stata;
reg educ fatheduc
The
\(t\)-statistic on \(fatheduc\)
is 9.43 which indicate that \(educ\) and \(fatheduc\) have a statistically
significant positive correlation.
Then, using
\(fatheduc\) as an IV for \(educ\) , the IV estimation using Stata gives,
ivreg lwage (educ=fatheduc)
The IV
estimate of the return of education is 5.9%, which is barely more than one-half
of the OLS estimate. This suggest that the OLS estimate is too high and is
consistent with omitted ability bias.