OLS with fixed effects and STATA

#1
Hi all,

Monday I started a topic on estimating OLS with fixed effects using SPSS.
That turns out to be impossible. So now I want to do it with Stata.

I need to estimate the following revenue equation:
y =ai + x1 + x2 + ... + x6 + time dummies (ai stands for bank-specific effects)

The idea behind this equation is that by summing the values of the coefficients
of x1, x2, and x3 I can say something about the competitive state of a banking market of a certain country (in literature: H-statistic).
Fixed effects are introduced to capture bank-specific effects (only varies between banks, not years).

The dataset contains an unbalanced panel of bank observations over 14 years and of 15 countries.
For every country I have to run a separate regression.
Would these be good Stata commands:

xtset bankid year (not sure about this one)
xtreg y x1 x2 x3 x4 x5 x6 i.year, fe

or (resulting in the same?)

regress y x1 x2 x3 x4 x5 x6 i.bankid i.year

(bankid stands for the identification number of a bank, it changes between banks, but not years)

Ps how can I only include bank observations when country value is for example Belgium
(In SPSS this is called 'Selection variable')? Or should I exclude other countries beforehand?

Thanks in advance!
 

bukharin

RoboStataRaptor
#2
Yes, use the -xtreg- syntax that you showed. You can run the regression just in Belgium by using the -if- qualifier, for example:
Code:
xtreg y x1 x2 x3 x4 x5 x6 i.year if country=="Belgium", fe
You would probably find it more convenient to use -statsby-, for example:
Code:
statsby, by(country) clear: xtreg y x1 x2 x3 x4 x5 x6 i.year, fe
This will replace your dataset with a dataset containing the coefficients for each country.
 
#3
question about statsby, why would you want to use the clear option? Wouldn't you lose your dataset? It seems like it would be better to use:
statsby, by(country) saving(filename): xtreg y x1 x2 x3 x4 x5 x6 i.year, fe
 

bukharin

RoboStataRaptor
#4
Yes, that's a very reasonable approach.

I personally prefer to use the "clear" option because: (1) it avoids saving the dataset to disk and I am often working in a read-only folder, (2) I probably don't want a saved dataset of the results anyway - I can always re-create them by re-running my do-file, (3) you can access the results immediately to do things like graph them, (4) you can wrap the -statsby- and follow-up commands inside -preserve- and -restore- so that you don't actually lose your dataset. Finally, this type of command is often the last stage of an analysis so you don't actually need the dataset any more (of course the original dataset is saved - you're just losing it for this particular run of your do-file).

However, it completely depends on your workflow - in many cases using the saving() option will make more sense. For example, if the main result of your analysis is the set of coefficients and you want to put them into a table in a word processor, or if you just want a simple saved summary of your analysis rather than needing to re-run your do-file again (especially if your analysis takes a long time run).

Here's an example of using -statsby- within a -preseve- & -restore- block:
Code:
sysuse auto, clear
preserve
statsby, by(foreign) clear: regress price mpg
list
restore
 
#5
Hi there

I have the same problem - I think! What I really want to do is, to show all the coefficients of the different groups, but only for one of the dependent variables at a time in a line-graph along with the within-estimator (the overall coefficient I guess).

I my case, I would then show how the level of taxation is collated with the number of entrepreneurs (as % of pop.) in every given country along with the overall fixed-effects within-estimator.

I use TSCS data on 40 countries with 11 years and my dataset are unbalanced.

Now my model looks like this: xtreg y x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12, fe vce(cluster Country_No)

So far, I have tried running a normal reg. in both parmby and statsby – like this:

parmby "reg y x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12", by(Country) label saving(model2,replace)
statsby, clear by(Country) : reg x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12

The problem is, that it then won’t include the country dummy variables and the estimations are way off compared to my original estimation (or at least that’s what I think? I simply just take the mean of all coefficients within one variable from the different countries – and that mean is nothing like my FE estimate).

I have also tried to run statsby and parmby with xtreg, fe - but that won’t put out any results at all (which I don’t blame it for, since it needs more than one country at a time).

So my question is – what do I do? I have simply given up and you guys are my last hope :)