MathType

Friday, 12 May 2017

Estimating VAR model with Stata (time series)




Vector autoregressive (VAR) models have a long tradition as tools for multiple time series analysis (Quenouille, 1957). Being linear model, they are relatively easy to work with both in theory and practice.

The VAR models become popular for economic analysis when Sims (1980) advocated them as alternatives to simultaneous equations models. The availability of longer observations time series emphasized the need for models which focused on the dynamic structure of the variables.

VAR models are easy to use for forecasting and can also be applied for economic analysis.Impulse response analysis or forecast error variance decomposition are typically used for disentangling the relations between the variables in a VAR model.To investigate structural hypotheses based on economic theory usually requires a priori assumptions which may not be testable with statistical methods.

The discovery of the importance of stochastic trends in economic variables and the development of cointegration analysis by Granger(1981), Engle and Granger (1987), Johansen (1995) and many others has led to important new developments in analyzing the relations from the short-run dynamics of the generation process of set of variables
 
 


In a univariate autoregression, a stationary time-series variable \({{y}_{t}}\) can often be modeled as depending on its own lagged values;

\({{y}_{t}}={{\alpha }_{0}}+{{\alpha }_{1}}{{y}_{t-1}}+{{\alpha }_{2}}{{y}_{t-2}}+...+{{\alpha }_{p}}{{y}_{t-p}}+{{\varepsilon }_{t}}\)                                                 (1)

When we analyzes multiple time series, the natural extension to the autoregressive model is the vector autoregression, or VAR, in which a vector of variables is modeled as depending on their own lags and on the lags of other variable in the vector.

In general, a VAR system contains a set of \(m\)  variables, each is expressed as a linear function of \(p\)  lags of itself and of all other \(m-1\)  variables, plus an error term. (It is possible to include exogenous variables such as seasonal dummies or time trends in a VAR).

With two variables, \(y\)  and \(z\) , the VAR with one lag looks like ;

\({{y}_{t}}={{\beta }_{10}}+{{\beta }_{11}}{{y}_{t-1}}+{{\beta }_{12}}{{z}_{t-1}}+{{\varepsilon }_{yt}}\)                   (2)
\({{z}_{t}}={{\beta }_{20}}+{{\beta }_{21}}{{y}_{t-1}}+{{\beta }_{22}}{{z}_{t-1}}+{{\varepsilon }_{zt}}\)                   (3)

where it is assumed (1) that both \({{y}_{t}}\)  and \({{z}_{t}}\)  are stationary; (2) \({{\varepsilon }_{yt}}\)  and \({{\varepsilon }_{zt}}\)  are white noise disturbances with \(\sigma _{y}^{2}\) and \(\sigma _{z}^{2}\) , respectively; and (3) \(\left\{ {{\varepsilon }_{yt}} \right\}\) and  \(\left\{ {{\varepsilon }_{zt}} \right\}\) are uncorrelated white noise disturbances.

Applied macroeconomist use models of this form to both describe macroeconomic data and to perform causal inference and provide policy advise.

Eq(2) and Eq(3) constitutes first-order VAR (since longest lag length is one). The structure of the system incorporates feedback since \({{y}_{t}}\) and \({{z}_{t}}\) are allowed to affect each other.

Note that, the term \({{\varepsilon }_{yt}}\) and  \({{\varepsilon }_{zt}}\) are pure innovations (or shocks) in \({{y}_{t}}\) and  \({{z}_{t}}\) respectively. If \({{\beta }_{21}}\ne 0\),  \({{\varepsilon }_{yt}}\) has indirect contemporaneous effect on \({{z}_{t}}\), and if \({{\beta }_{12}}\ne 0\) , \({{\varepsilon }_{yt}}\) has indirect contemporaneous effect on \({{y}_{t}}\).

Eq(4.1) and Eq(4.2) is not reduced-form equations since \({{z}_{t}}\) has indirect contemporaneous effect on \({{y}_{t}}\), and  \({{y}_{t}}\) has indirect contemporaneous effect on \({{z}_{t}}\).

Estimation using Stata

For simple VAR estimation with Stata, we will use the varbasic command.

The varbasic command allows us to fit s simple reduced-rorm VAR without constraints and graph the impulse-response functions (IRFs).The more general var command allows for constraints to be placed on the coefficients.

The number of lags, which is given as numlist, default to (12).Note that, we must list every lag to be included; for instance lags(4) would only include the fourth lag, whereas lags(1/4) would include the first four lags.

When writing down the VAR, we will makes two basic model-selection choices. First, we chooses which variables to include in the VAR. This decision is typically motivated by the research question and guided by the theory. Second, we choose the lag length. Usually we use the formal lag-length selection criteria method to guide us the appropriate lag we need to include into our VAR model.

Once the lag length has been determined, we may proceed to estimation; once the parameters of VAR have been estimated, we now can perform post estimation procedures to assess model fit.

For our discussion, lets we use the data Data09.dta The variable we want to estimate is lrgrossinv (= log for real gross fixed capital formation) , lrconsump (= log for real household consumption expenditure) and lrgdp (= log for real gross domestic product) .

Hence, the VAR model will estimate become;

 

 where \({{a}_{0}}\) is a vector of intercept terms and each of \({{\text{A}}_{1}}\) to \({{\text{A}}_{k}}\)  is \(3\times 3\)  matrix of coefficients. VARs with these variables, or close analogues to them, are common in monetary policy analysis.

Before we begin, lets we first set the structure of our data in time series form;

tsset t,quarterly
 
 
The next step is to decide the appropriate lag length. To do this, we use the  varsoc command to run lag order selection diagnostics with the maximum lag length is 10 lags.

varsoc lrgrossinv lrconsump lrgdp, max(10)

 

The results show of a list of lag-order selection tests. We can get more information of these test in help varsoc.

Both the likelihood ratio test and the Akaike’s information criterion recommend seven lags, which means we use these lag through the rest of our VAR analysis.

With variables and lag length in our hand, right now there are two object to estimate, namely the coefficient matrices and the covariance matrix of the error term. Coefficient can be estimated by least squares, equation by equation. The covariance matrix of the error can be estimated from the sample covariance matrix of the residual. The command var can perform for both tasks.

Now, to estimate the VAR model for Eq(4) with the seven lag-length;

var lrgrossinv lrconsump lrgdp, lags(1/7) dfk small

 

Then, to estimate the covariance of the error terms which is can be found in the stored results e(Sigma);

matlist e(Sigma)

 

The output of var organizes its results by equation, where an “equation” is identified with its dependent variable. From the Eq(4), there is log for real gross fixed capital formation, log for household consumption expenditure equation and log for real gross domestic product.

e(Sigma)holds the covariance matrix of the estimated residuals from the VAR. Note that the residuals are correlated across the equations.

As we expected, the table of VAR for coefficients is rather long.  Not including the constant terms, the VAR with \(m\)  variables and \(p\) lags will have \(p{{m}^{2}}\)  coefficients. From our VAR model, we have 3-variable, 7-lag and this make the VAR has 66 coefficients that are estimated with only 200 observations.

We choose the option dfk and small to apply small-sample corrections to the large-sample statistics that are reported by defaults.

From the VAR output, we can glance down the table of coefficients, standard errors, \(t\)-statistics, and \(p\)-value. However, practically in applied papers we don’t need to report the table of coefficients from VAR. Instead, usually we report some postestimation statistics that are (hopefully) more informative. The most popular postestimation that we need run first with VAR analysis is forecasting, causality analysis and impulse response function analysis.




No comments:

Post a Comment