MathType

Sunday 19 June 2016

Within and Between Variation in Panel Data with Stata (Panel)



Dependent variables and regressors can potentially vary over both time and individual.

Within variation – variation over time or given individual (time-variant).

Between variation – variation across individual (time-invariant).

Overall variation : variation overtime and individuals.



Individual mean:

\({{\bar{x}}_{i}}=\frac{1}{T}\sum\nolimits_{t}{{{x}_{it}}}\)

Overall means:

\(\bar{x}=\frac{1}{NT}\sum\nolimits_{i}{\sum\nolimits_{t}{{{x}_{it}}}}\)

Overall variance:

\(s_{o}^{2}=\frac{1}{NT-1}{{\sum\nolimits_{i}{\sum\nolimits_{t}{\left( {{x}_{it}}-\bar{x} \right)}}}^{2}}\)

Between variance:

\(s_{B}^{2}=\frac{1}{NT-1}{{\sum\nolimits_{i}{\left( {{x}_{it}}-\bar{x} \right)}}^{2}}\)

Within variance:

\(s_{W}^{2}=\frac{1}{NT-1}{{\sum\nolimits_{i}{\sum\nolimits_{t}{\left( {{x}_{it}}-{{{\bar{x}}}_{i}} \right)}}}^{2}}=\frac{1}{NT-1}\sum\nolimits_{i}{\sum\nolimits_{t}{{{\left( {{x}_{it}}-{{{\bar{x}}}_{i}}+\bar{x} \right)}^{2}}}}\)

The overall variation can be decomposed into between variation and within variation:

\(s_{o}^{2}\approx s_{B}^{2}+s_{W}^{2}\)


We use the data Paneldata01.

To generate this variance decomposition;

xtsum

 




































In xtsum output, Stata uses lowercase \(n\) to donate the number of individuals and uppercase \(N\)  to donate the total number of individual-time oberservation.

To tabulates data that provide additional details on within and between variation of a certain variable;

  xttab south 

 


Overall summary shows 71% of the 4165 individual-year observation had south=0 and 29% had south = 1.

Between summary indicate from 595 people, 72% had south=0 at least once and 31% had south=1 at least once.

Within summary indicate 95% of people who ever lived in south always lived in south during time period covered by the panel, and 98% who lived outside the south always lived outside the south.

To tabulate data that provides transition probabilities from one period to the next;

xttrans,freq

 


One period is lost in calculating transition (3571 obs are used)

For time-invariant diagonal entries will be 100% and off-diagonal entries be 0%.

For south, 99.2% of the obs ever in south for one period remain in the south for the next period. For those did not live in south for one period, 99.7% remain outside south for the next period.

South variable is close to time-invariant.

TIME-SERIES PLOTS FOR EACH INDIVIDUAL

We will use the graphic line to plot some variable.

To produce graph line of lwage for the first 20 individuals in the sample separately,

xtline lwage if id<=20

 

 
To produce line graph of lwage for the first 20 individual in the same graph

xtline lwage if id<=20, overlay

 



OVERALL SCATTERPLOT

In case if we want to look the relation between two variables only or there one key regressor, which is a scatterplot of the dependent variable on the key regressor using data from all panel obs.

To produce scatter graph  between lwage and exp,  and then add it with fitted linear regression and quadratic regression line to the scatterplot;

graph twoway (scatter lwage exp)(lfit lwage exp) (qfit lwage exp)







 



















WITHIN AND BETWEEN SCATTERPLOT

The option is fe for within variation, be for between variation and re for random effect variation.

To produce scatterplot for within variation for lwage and exp;

xtdata,fe 
graph twoway (scatter lwage exp)(lfit lwage exp) (qfit lwage exp)


 


 


To produce scatterplot for between variation for lwage and exp;

xtdata,be
graph twoway (scatter lwage exp)(lfit lwage exp) (qfit lwage exp)



 


















POOLED OLS REGRESSION WITH CLUSTER-ROBUST STANDARD ERRORS

The individual-spesific-effects model for the scalar dependent variable \({{y}_{it}}\) specifies that;

\({{y}_{it}}={{\alpha }_{i}}+{{\text{{x}'}}_{it}}\beta +{{\varepsilon }_{it}}\)                                            (1)

where \({{\text{{x}'}}_{it}}\) are regressor, \({{\alpha }_{i}}\) are random individual-spesific-effects, and \({{\varepsilon }_{it}}\)  is and idiosyncratic error.

From our data panel, the econometric model that we want estimated is;

\(lwage=\alpha +{{\beta }_{1}}ex{{p}_{it}}+{{\beta }_{2}}exp{{2}_{it}}+{{\beta }_{3}}wk{{s}_{it}}+{{\beta }_{4}}e{{d}_{it}}+{{u}_{it}}\)  (2)

Variable ed is time-invariant while variable exp and wks is time-variant.

Regressing model Eq(2) yields consistent estimates of \(\beta \)’s if the composite error \({{u}_{it}}\) is uncorrelated with independent variables.

But, the \({{u}_{it}}\) likely to be correlated overtime for a given individual, so we use cluster-robust standard errors that cluster on the individual.

The option vce(cluster clustervar) will be used to affects the standard errors and variance-covariance matrix of the estimators but not the estimated coefficients.

regress lwage exp exp2 wks ed , vce(cluster id)

 
 
Output shows \({{R}^{2}}=0.28\),and the estimates imply that wages increase with experience until a peak at 31 year [=0.0447/(2 x 0.00072) and then decline. Wage increase by 0.6% with each additional week worked. And wages increase by 7.6% with each additional year of education.










No comments:

Post a Comment