frequencies for intervals

#1
Hi,

I would like to ask how can I make a frequencies table for a variable that has each value only once (e.g. income). I have set of 300 different values of income from respondents and I would like to get the frequencies in a table saying "income 0-100 = XX times, income 101-200 = YY times" etc. (content-wise, not format-wise)
How can I do that? I havent figured out a way with "tabulate" or "table" commands to do that...
thank you!
 

bukharin

RoboStataRaptor
#2
You need to recode the continuous variable (eg income) into a categorical variable. Examples:
Code:
recode income (0/100=1 "0-100") (100/200=2) (...you get the idea...) (900/max=10 ">900"), gen(incomecat)
or
Code:
egen incomecat=cut(income), at(0(100)1000) label
You need to be careful with the categories with -recode-. If your second rule was (101/200) then an income of 100.5 wouldn't be recoded. The rules I've suggested work because once a match has been made, and the value recoded, it won't match any subsequent rules. See -help recode- and the Stata User's Guide.

-egen- requires less typing but the first number (in this case 0) needs to be less than or equal to the lower number in your dataset, and the last number (in this case 1000) needs to be greater than the highest income in your dataset - so you need to look at the data first. Of course you should already be doing that...

After you've recoded it's simply a matter of:
tab incomecat
 

bukharin

RoboStataRaptor
#4
You're welcome. If you're lazy like me you can extract the maximum value using -summarize-, then plug the result directly into -egen-:
summarize income, meanonly
egen incomecat=cut(income), at(0(100)1000 `=r(max)+1')

The manual entry for -egen- is well worth reading - repeatedly - since it will frequently save you a lot of time...