ANOVA or Poisson Regression

#1
Hi all,

I want to get some feedback on the approaches I've considered for an analysis. Here is the setup

Unit of analysis: Physician
Variables: Physician Specialty, Number of Total Patients they take care of, Number of patients who are diabetic.

A typical line of data looks like this:
Code:
Physician      Specialty               Number of Patients        Number of Patients Diabetic     
1                   Cancer                   50                                   25
2                   Internal Med         30                                   10
3                   Pediatrics              20                                    5
4                   Cancer                  40                                    5

N=10,000 physicians in total.
Our goal is to compare the specialties in terms of their diabetic rates.

Our first approach involved computing individual level physician rates as an outcome. Once we have these for each physician, we use a simple 1 way ANOVA on them, with specialty as the between groups factor.

The second approach is to use a Poisson regression. Here I would model the number of diabetic patients as a function of the specialty, with the total number of patients as an offset term.

I know the second way is probably more sound. My question is whether the first method is really that poor or not. We plan to submit this for journal publication eventually so If going with option 2 will nip any reviewer comments in the bud, I'm happy to go there.

note: what we have here isn't a rate per se, but a proportion, since the number of diabetic patients is a true subset of the total patients (e.g. # diabetics le # total patients) so logistic regression would be more appropriate. however for interpretative reasons, poisson regression would be better for us and the audience.


TIA
 

j58

Active Member
#2
I'm not crazy about either ANOVA or Poisson regression for this problem. For ANOVA, theoreticaly, neither the normality nor the homoscedasticity assumptions are valid. The latter is automatically violated because you are modeling proportions, and the variance of a proportion depends on the proportion: Var(p) = p(1-p); thus if p varies across groups, so will the variance. That said, in practice, if your proportions are not too widely spread among groups, then the assumptions can be approximately satisfied. So, if those assumptions check out for your data set, ANOVA can be a reasonable option.

As to Poisson regression, the Poisson model is inappropriate, because you are not modeling rates (as you note), but proportions. A rate can vary from 0 to infinity, but your values, proportions (count/offset), can only vary from 0 to 1. So a Poisson model is not likely to fit these data well.

Since you are modeling proportions, logistic regression is the natural approach to use. Since proportions might differ among physicians' practices within a specialty, a mixed logistic model with a random term for physicians would probably be optimal (so you'd be looking at a mixed logistic model).
 
Last edited:
#3
I'm not crazy about either ANOVA or Poisson regression for this problem. For ANOVA, theoreticaly, neither the normality nor the homoscedasticity assumptions are valid. The latter is automatically violated because you are modeling proportions, and the variance of a proportion depends on the proportion: Var(p) = p(1-p); thus if p varies across groups, so will the variance. That said, in practice, if your proportions are not too widely spread among groups, then the assumptions can be approximately satisfied. So, if those assumptions check out for your data set, ANOVA can be a reasonable option.

As to Poisson regression, the Poisson model is inappropriate, because you are not modeling rates (as you note), but proportions. A rate can vary from 0 to infinity, but your values, proportions (count/offset), can only vary from 0 to 1. So a Poisson model is not likely to fit these data well.

Since you are modeling proportions, logistic regression is the natural approach to use. Since proportions might differ among physicians' practices within a specialty, a mixed logistic model with a random term for physicians would probably be optimal (so you'd be looking at a mixed logistic model).
Hi j58,

Many thanks for your reply and valuable insights. I'll definitely look into the assumptions of the ANOVA more closely when I get back into the office. I like the idea of a random intercept model in the logit approach to capture variability - will bring these back to the group.

Best,