Minimizing impact of age variable

#1
Hi

So I'm working with a health outcomes dataset for an entire state. The dataset is also broken up into groups by location within the state. There are around 50 groups, with a total of around 3000 individuals across the state. Each individual record has medical condition variables, indicating whether they have a certain condition or not (high blood pressure, for instance).

We want to compare medical condition instances between several of the 50 sites, so we can ask the question “Why does site X have a higher instance of heart trouble than site Y?” However, site X might have a higher instance of heart trouble because it has a higher percentage of people in their 80s (we assume for this example that people in their 80s are more likely to have heart attacks) than site Y. So the issue we are dealing with is minimizing the impact of age on our comparison across sites.

Any ideas on how to achieve this? Would some sort of normalization work?

Thanks,
E
 

Karabiner

TS Contributor
#3
We want to compare medical condition instances between several of the 50 sites, so we can ask the question “Why does site X have a higher instance of heart trouble than site Y?”
So you want to find out whether the rates differ between sites (or, more specific, which sites have
particularly high or low rates), and in a second step want to find out, why? Or you already have
additional information about the sites/populations within the sites, which you want to include into
your analysis?

With kind regards

K.
 

hlsmith

Less is more. Stay pure. Stay poor.
#4
Karabiner, I agree with your paraphrasing of the question. I would also ask how you collected your data (random sampling) and if you know if these are reflective samples?

I think this may fall under the purpose of standardizaion. See the following for a 10,000 foot view: http://en.wikipedia.org/wiki/Age_adjustment