If individual effect
\({{u}_{i}}\) (cross-sectional or time specific effect) does
not exist\(\left( {{u}_{i}}=0 \right)\), OLS produces efficient and consistent
parameter estimates;
\({{y}_{it}}={{\beta
}_{0}}+{{\beta }_{1}}{{x}_{it}}+{{u}_{i}}+{{v}_{it}}\) (1)
and we assumed that \(\left(
{{u}_{i}}=0 \right)\) .
OLS consists of five
core assumptions (Greene,2008; Kennedy,2008)
o Linearity – the model is
linear function.
o Exogeneity – expected
value of disturbance is zero or disturbance are not correlated with any
regressor.
o Homoscedasticity &
no autocorrelation.
o Not stochastic for the
independent variable but fixed in repeated samples.
o Full rank – there is no
exact linear relationship among independent variables
There are
several strategies for estimating a fixed effect model; the least squares dummy
variable (LSDV) model, within estimation and between estimation.
LSDV
The least
squares dummy variable (LSDV)
model is widely used because it is relatively easy to estimate and interpret
substantively. But, the LSDV will become problematic when there are many
individual (or groups) in panel data.
If \(T\) is fixed and \(n\to \infty \) (\(n\) is number of groups or firms, and \(T\) is number time period) parameters estimates
are consistent but the coefficients of individual effects, \({{\beta
}_{0}}+{{u}_{i}}\) are not (Baltagi, 2001).If the LSDV includes a large number
of dummy variables, the number of parameter increases as \(n\) increases.
Therefore, LSDV
loses \(n\) degree of freedom but returns less efficient estimators.Under
this circumstance, LSDV is useless and thus calls another strategy
\({{y}_{i}}={{\beta
}_{1i}}+{{\beta }_{2}}{{x}_{it}}+{{v}_{it}}\) (2)
we put the subscript
\(i\) on the intercept term to suggest that
the intercept of the individuals may be different, and the differences may be
due to special features of each individuals.
Within
Estimation
Unlike LSDV, the
“within” estimation does not need dummy variables, but it uses deviations from
group (or time period) means. That is, “within” estimation uses variation
within each individual or entity instead of a large number of dummies.
To get the FE with
“within’” estimation, for each \(i\) , we need to average the
Eq(2.1) overtime,
\({{\bar{y}}_{i}}={{\beta
}_{0}}+{{\beta }_{1}}{{\bar{x}}_{i}}+{{u}_{i}}+{{\bar{v}}_{i}}\) (3)
where \({{\bar{y}}_{i}}={{T}^{-1}}\sum\nolimits_{t=1}^{T}{{{y}_{it}}}\) , \({{\bar{x}}_{i}}={{T}^{-1}}\sum\nolimits_{t=1}^{T}{{{x}_{it}}}\) and \({{\bar{v}}_{i}}={{T}^{-1}}\sum\nolimits_{t=1}^{T}{{{v}_{it}}}\)
Because \({{u}_{i}}\)
is fixed over time, it still appears in Eq(3).
Subtract Eq(3)
from Eq(1) for each \(t\) ;
\({{y}_{it}}-{{\bar{y}}_{i}}={{\beta
}_{1}}\left( {{x}_{it}}-{{{\bar{x}}}_{i}} \right)+{{v}_{it}}-{{\bar{v}}_{i}}\)
or
\({{\ddot{y}}_{it}}={{\beta
}_{1}}{{\ddot{x}}_{it}}+{{\ddot{v}}_{it}}\) (4)
Where\({{\ddot{y}}_{it}}={{y}_{it}}-{{\bar{y}}_{i}}\)
is the time-demeaning data on \(y\) ,
and similarly for \({{\ddot{x}}_{it}}\) and \({{\ddot{v}}_{it}}\) .
The parameter
estimates of regressors in the “within” estimation are identical to those of
LSDV and reports correct of the RSS.The FE with “within estimator” allows for arbitrary correlation between
and the explanatory variables in any time
period, just as with first differencing.
Because of
these, any explanatory variable that is constant overtime for all \(i\) get swept away by the fixed effects transformation
(Kennedy, 2008; Wooldridge, 2009).
Between
Estimation
The Eq (3) is also
called as “between group” estimation, or the group mean regression which is
uses variation between individual entities (group).
Specifically, this
estimation calculates group means of the dependent and independent variables
and thus reduces the number of observation s down to \(n\)
.
Because only
cross-section variation in the data is used, the coefficient of any
individual-invariant regressors, such as time dummies, cannot be identified.
Estimation
using Stata
For our
discussion on the FE using Stata, lets we use the data airline.dta and we want to estimate the effects of
output, fuel and loadinfg factor to the cost of airline companies;
\(cos{{t}_{it}}={{\beta
}_{0}}+{{\beta }_{1}}outpu{{t}_{it}}+{{\beta }_{2}}fue{{l}_{it}}+{{\beta
}_{3}}loa{{d}_{it}}+{{v}_{it}}\) (5)
where;
\(cos{{t}_{it}}\) =
cost of airline companies
\(outpu{{t}_{it}}\) = revenue passenger mile (output index)
\(fue{{l}_{it}}\) = fuel prices
\(loa{{d}_{it}}\) = loading factor (average capacity utilization of the fleet)
Now, lets
us regress the Eq(5) by the pooled OLS
reg cost output
fuel load
The results show
that the pooled OLS model fits the data well; with high \({{R}^{2}}\) and also all
the variables is significance even at 1% level.
To estimate
the LSDV model, Let us examine
fixed group effects by introducing group (airline) dummy variables.
Let set dummy
variable as;
g1 =1 for airline 1; 0 = otherwise.
g2 =1 for airline 2; 0 = otherwise.
.
.
g6 =1 for airline 6; 0 = otherwise
Now we generate the new
series of dummy variables for each groups (airline);
gen g1=(airline==1)
gen g2=(airline==2)
gen g3=(airline==3)
gen g4=(airline==4)
gen g5=(airline==5)
gen g6=(airline==6)
list airline year g1-g6 if year<=2,noobs
The LSDV model from Eq(5) will become;
\(cos{{t}_{it}}={{\beta
}_{0}}+{{\beta }_{1}}outpu{{t}_{it}}+{{\beta }_{2}}fue{{l}_{it}}+{{\beta
}_{3}}loa{{d}_{it}}+{{u}_{1}}{{g}_{1}}+{{u}_{2}}{{g}_{2}}+{{u}_{3}}{{g}_{3}}+{{u}_{4}}{{g}_{4}}+{{u}_{5}}{{g}_{5}}+{{v}_{it}}\)(2.6)
Five group dummies \(\left(
{{g}_{1}}-{{g}_{5}} \right)\) are added to the pooled OLS equation. We excluded \({{g}_{6}}\) from the regression equation in order to avoid
perfect multicollinearity or we called as dummy variable trap.
The \(\left(
{{u}_{1}}-{{u}_{5}} \right)\) are
respectively parameter estimates of group dummy variables \(\left( {{g}_{1}}-{{g}_{5}} \right)\).
Now, lets us
regress the Eq(6).
reg cost output fuel
load g1 g2 g3 g4 g5
The LSDV results
seem fits better than the pooled OLS. The F-statistics increased from 2419.34
to 3935.79, the RSS decreased from 1.335 to 0.293 and the
increased from 0.988 to 0.997.
Because we
included the dummy variables, the model loses five degree of freedom. Parameter estimated we get from the LSDV model also different form the
pooled OLS model but the sign still consistent.
The LSDV model
posits that each airline has its own intercept but share the same slopes of
regression.
The parameter
estimate of \({{g}_{6}}\) (dropped dummy for Airline 6) is presented in
the LSDV model by the intercept (9.793) , which is the benchmark intercept
(reference point).
The value of \(\left(
{{u}_{1}}-{{u}_{5}} \right)\) represents the deviation (or differences) of its
group specific intercept from the benchmark intercept (Airline 6). Eg.,\({{u}_{1}}=-0.087\) means the
intercept of Airline 1 are smaller by 0.087 than Airline 6, and the intercept
for airline 1 is \({{\beta }_{0}}+{{u}_{1}}=9.793-0.087=9.706\) .
The equations for
each airline will become;
Airline 1: \(cos\hat{t}=9.706+0.919*outpu{{t}_{it}}+0.417*fue{{l}_{it}}-1.070*loa{{d}_{it}}\)
Airline 2: \(cos\hat{t}=9.665+0.919*outpu{{t}_{it}}+0.417*fue{{l}_{it}}-1.070*loa{{d}_{it}}\)
Airline 3: \(cos\hat{t}=9.497+0.919*outpu{{t}_{it}}+0.417*fue{{l}_{it}}-1.070*loa{{d}_{it}}\)
Airline 4: \(cos\hat{t}=9.890+0.919*outpu{{t}_{it}}+0.417*fue{{l}_{it}}-1.070*loa{{d}_{it}}\)
Airline 5: \(cos\hat{t}=9.730+0.919*outpu{{t}_{it}}+0.417*fue{{l}_{it}}-1.070*loa{{d}_{it}}\)
Airline 6: \(cos\hat{t}=9.793+0.919*outpu{{t}_{it}}+0.417*fue{{l}_{it}}-1.070*loa{{d}_{it}}\)
Let’s we compare the
pooled OLS and LSDV side by side with Stata command estout.If not available, installing it by typing ssc install estout.
* pooled OLS
quiet reg cost output fuel load
estimates store pooled
* LSDV
quiet reg cost output fuel load g1-g5
estimates store LSDV
* create table
estout pooled LSDV,cells(b(star fmt(3))
se(par fmt(3))) stats(F df_r rss rmse r2 r2_a N)
Note:
F
|
= F-statitics
|
df_r
|
= degree of freedom
|
rss
|
= residual sum of squares
|
rmse
|
= root mean square error
|
r2
|
= R-square
|
r2_a
|
= adjusted R-squares
|
N
|
= number of obs
|
Parameter estimates
of regressor show some differences between the pooled OLS and LSDV, but all of
them statistically significant at 1% level.
The pooled OLS
report overall intercept. The LSDV report the intercept of the dropped
(benchmark) and deviation of other five intercepts from the benchmark.
The another way to
estimate the FE is by using the “within” estimation. The Stata xtreg command
estimates “within group” estimator without creating dummy variables.
Before we run the xtreg
command, we need to specifies first the cross-sectional and time series
variables,
xtset airline year
To estimate the FE
model by “within” estimation as in Eq(4);
xtreg cost output fuel load,fe
The F-test in last
line examines the null hypothesis that five dummy parameter in LSDV are zero \(\left(
{{u}_{1}}={{u}_{2}}={{u}_{3}}={{u}_{4}}={{u}_{5}}=0 \right)\).
The large
F-statistic reject the null hypothesis in favor of the fixed group effect.The
intercept of 9.713 is the average intercept.
The xtreg does not display an analysis of variance
(ANOVA) table including SSE.Since many related statistics are stored in macro,
we need to run ereturn or display to get them.
ereturn list
To display the value of model sum of squares (MMS) or so
called explain sum of squares (ESS) and residual sum of squares (RSS);
display e(mss) e(rss)
To get the value of Root
MSE which the fomula is \(\left( RSS/\left( n-k \right) \right)\) ;
display sqrt(e(rss)/e(df_r))
To
display the value of \({{R}^{2}}\) and Adjusted-\({{R}^{2}}\);
display e(r2) e(r2_a)
Let us get some comparison
between the OLS, LSDV and the “within” estimation;
reg cost output fuel load
estimates store OLS
reg cost output fuel load g1-g5
estimates store LSDV
xtreg cost output fuel load,fe
estimates store xtreg
estout OLS LSDV xtreg,cells(b(star
fmt(3)) se(par fmt(3))) stats(F df_r mss rss rmse r2 r2_a F_f F_absorb N)
Note:
F
|
= F-statitics
|
df_r
|
= degree of freedom
|
rss
|
= residual sum of squares
|
mss
|
= model(explain) sum of
squares
|
rmse
|
= root mean square error
|
r2
|
= R-square
|
r2_a
|
= adjusted R-squares
|
F_f
|
= F-test (fixed effect)
|
F_absorb
|
= F-test (fixed effect)
|
N
|
= number of obs
|
The result shows
contrast the output of the pooled OLS and and the fixed effect estimation (LSDV, and xtreg )
Except for the pooled OLS, estimate from
FE produce same RMSE, parameter estimates and SE but reports a bit different of
goodness-of-fit measures.
Which estimation is
best for us?
LSDV generally
preferred because of correct estimation, goodness-of-fit, and group/time
specific intercepts. But, if the number of entities and/or time period is large
enough, say over 100 groups, the xtreg will provide less painful and more elegant solutions including F-test
for fixed effects.
No comments:
Post a Comment