[Stata] - Please help calculating Herfinahl index

Hmm it seems that even var1 is not correct because you are not summing over several observations. Just a question: does your dataset look like this:

obs1 firm1 sales_of_firm_1
obs2 firm2 sales_of_firm_2

I ask the question because it is not clear whether you want to calculate serveral Herfindahl indexes or just one to characterize your datatset.



New Member
The data look like this

company 1 .....var ..... sales in th euro 1
company 2 ...... var ..... sales 2
company more10000 ........ etc...

and I have to calculate the herfindahl index

how it could be and thanks
So first you have to generate a variable for the square of the sales:

gen sales2 = sales^2

Then calculate the sum of the square, for instance like this:

collapse (sum) H=sales2

If you want to calculate the modified version of the H index, you may want to store the number of companies in your dataset before using the collapse command. For instance you can do this:

local N = r(N)

and after using the collapse command you just need to write this:

gen HH = (H-1/`N')/1-1/`N'

Hope this helps!


New Member
Thank you Etienne

Actually, I wasn,t accurate where the equation shouldn,t be
egen var1=sum( sales^2)
but it should be
egen var1=sum( marketshare^2) where market share=(sales/x) where egen x=sum(sales)

then I did this gen Marketshare2=( Marketshare)^2
then gen HH=sum(Marketshare2), I got different value for each observation but the values for final observations equal to say .00***xx

and when I applied your equation
collapse (sum) H=marketshare2, it drop all variables and got H= one constant in one observation =.00***xx (the same number)

when I applied the equations you wrote
local N = r(N)
gen Herfindahl = (H-1/`N')/1-1/`N'

first using your var H I got the same index for each company

second using my var HH i got different index for each company

so 1- I think Herfindahl index should be different for each company what do you think, its first time to me to deal with this index

2-what do you think about the equations procedure shown above.

3-what does this in exact mean( count
local N = r(N)


Basically the Herfindahl index is an indicator of the level of competition of a given market. Therefore you must have just one value for this index to characterize a given market (it does not vary across companies on this market).

When you use the commands I wrote you do end up with one observation corresponding to the index for your dataset. I agree that it does not make much sense to have a dataset with one observation so you may want Stata to display the value of the index in the "results" window instead of collapsing all the observations.

To calculate the index with the market shares, you do want to apply the following commands:

egen x = sum(sales)
gen share = sales/x
gen share2 = share^2

After doing this you just have to sum share2 over the different companies. When I look at your wikipedia link it seems that the index is calculated over the 50 largest companies. My suggestion is to create an indicator for these top companies, for instance:

gsort - share
gen top = _n < 51

Then you just have to sum the squared market share over these companies:

by top : egen H = sum(share2)

and display the value of H, for instance:

sum H if top == 1

On the other hand, if you want to calculate the index for all companies and then calculate the normalized index that takes into account the number of firms in your dataset, you want to store the number of companies first:

local N = r(N)

"count" counts the number of observations and local N = stores them in a local variable called N (after any command in Stata results are stored in local variables. You can see the names of these variables by typing "return list". For estimation commands type "ereturn list")

and then calculate your normalized index and display the results:

egen Hnorm = sum(share2)
replace Hnorm = (Hnorm-1/`N')/1-1/`N'

I think this answers your 3 questions.

Have a good day


I've got a question related to Herfindahl index in panel data. If we have got a panel data with some missing values during some years for some companies across different industries. Am I able to still use the above Stata commands? and How would it be possible to calculate normalised Herfindahl index.

Well I assume that market shares evolve smoothly over time so you may want to impute market shares when you have missing values. For instance you can fit a linear model in which you regress market shares on a polynomial of time for each company in your dataset.

After doing this you can calculate the Herfindahl index in each industry and for each year by doing smtg like:

bys industry year : egen H = sum(share2)

To calculate the normalized index, you need first to count the number of companies for each industry and year cell:

bys industry year : gen N = _N

and then calculate as above:

gen Hnorm = (H-1/N)/1-1/N

Hope this helps!