Combinatorics in Stata 12


I am trying to generate new variables in a stata dataset which includes all pairwise and triplets combinations of a count of 14 conditions. The conditions are listed as values in one variable 'chrondis' 1 to 14.

chrondis | Freq. Percent Cum.
cardio | 60 0.95 0.95
cataracts | 106 1.67 2.62
highbp | 623 9.83 12.45
highchol | 1,214 19.16 31.61
stroke | 54 0.85 32.47
diabetes | 308 4.86 37.33
lungdis | 101 1.59 38.92
asthma | 369 5.82 44.74
artritis | 1,581 24.95 69.70
osteoart | 611 9.64 79.34
cancer | 448 7.07 86.41
parkins | 37 0.58 86.99
stomachul | 554 8.74 95.74
hipfrac | 270 4.26 100.00
Total | 6,336 100.00

Alternatively, there are 14 binary variables for each condition 0=condition not present, 1=condition present.

Is there a way to calculate all possible 2way and 3way unique combinations (without repetition, order is not important) in Stata 12.



How about something like:
local all14 cardio cataracts highbp highchol stroke diabetes lungdis asthma artritis osteoart cancer parkins stomachul hipfrac
local next13 cataracts highbp highchol stroke diabetes lungdis asthma artritis osteoart cancer parkins stomachul hipfrac
local groupnum=1
foreach var1 of varlist `all14' {
    foreach var2 of varlist `next13' {
        egen group`groupnum++'=group(`var1' `var2')
So this will give you, for example, a variable called group1 which will take the values of 1-4 for the four possible combinations of -cardio- and -cataracts-. Is that what you're after?

Obviously the 3-way combinations would be an extension of the above.

This seems a strange thing to do; may I ask why you're doing this?
Thanks for feedback. I actually figured out another way & have included the syntax below for anyone interested in doing this again.

for any list of binary variables v1-v14

* three-way combinations
forvalues i = 1/12{
local jstart = `i' + 1
forvalues j = `jstart'/13{
local kstart = `j' + 1
forvalues k = `kstart'/14{
gen comb_`i'_`j'_`k' = v`i'+v`j'+v`k'
* two-way combinations
forvalues i = 1/13{
local jstart = `i' + 1
forvalues j = `jstart'/14{
gen comb_`i'_`j' = v`i'+v`j'
@ Bukharin - I am doing a study on the effects of combinations of chronic disease on health outcomes. Trying to figure out if some combinations have more profound health outcomes than others.


Thanks for posting your solution - I wish more people would do that!

Isn't this just a series of interactions? So for example if you were using linear regression and you wanted to test the difference between these two groups:
cardio + cataracts
highbp + highcol

You could do something like:
regress outcome cardio##cataracts highbp##highcol
lincom 1.cardio + 1.cataracts + 1.cardio#1.cataracts - (1.highbp + 1.highcol + 1.highbp#1.highcol)

The other thing to be aware of is type I error. With 455 possible combinations for 6336 patients you should expect to see a large number of false positives.

i have a variable that explain the means of transports used with the following codes
4=on foot
5=others (mention)
so the problem is in other they mention other means that are more or less the same with the ones in the category (e.g gari in swahili which means car in english).
now am sitting with two variables the original means of transport and the other variable that explain other means.
my question is what stata command can join the two variables into one since they have some values which are similar.