Correct! Is there any obvious reason why models such as ANCOVA won't work on count data? I've seen several publications using ANCOVA or equivalent on these types of problems. I don't doubt that count models are superior in this case, but just need some arguments if any one asks me why I disregarded methods that would be expected in mose cases

When you use ANCOVA or other linear models that use ordinary least squares estimation, there is an assumption made that the conditional distribution of the dependent variable is normal, independent, and identically distributed (homoscedastic) for

*any* combination of levels of the predictor variables. For example, in a simple ANOVA model, this means that we assume (among other things) that the distribution of the dependent variable has the same variance within each group (and is normally distributed within each group).

When working with count data, the normality assumption is obviously breached: The normal distribution is continuous and unbounded, whereas count data is restricted to a discrete set of values that is bounded at zero (i.e., restricted to the non-negative integers). This often manifests in a positively-skewed distribution. The ordinary least squares estimator can remain unbiased, consistent, and efficient in the case of a non-normal conditional distribution of the response variable - i.e., the point estimates of means or regression coefficints are still good estimates. But with relatively small sample sizes confidence intervals and significance tests will be untrustworthy. With larger sample sizes this becomes less of an issue.

The more important issue is the likely breach of homoscedasticity. Specifically, with count data, groups that have higher means (higher average counts) tend to have higher variances as well. This breaches the assumption that the variance of the response variable is identical across all combinations of levels of the independent variables. I wouldn't be surprised if you can actually see this in your own data (if you are comparing groups of patients with quite different frequencies of illness) - you may see that the group with the higher mean has a higher variance as well. This will result in the OLS estimator that ANCOVA uses not being an efficient estimator. I.e., it is not as accurate as other estimation methods in this scenario.

Also, in what regard would ANCOVA be worse, statistical power or interpretation of results?

I'm not sure about statistical power, but the interpretation of the results is compromised in the ANCOVA case - i.e., you might report confidence intervals and significance tests and so on, but they may be untrustworthy due to assumption breaches.