Hello dear forum members,

Using county-level panel data (N = 2500+, T = 3) I aim to examine the associations between multiple biological, socioeconomic, and psychological factors and cancer incidence. There are two outcome measures available for me:

(1) cancer incidence rate, IR = (New cancers / Population) × 100,000

M = 197.19, SD = 51.12, Min = 0, Max = 610.6, Skewness = .1, Kurtosis = 7.25

(2) count of new occurrences

M = 241.4, SD = 674.11, Min = 0, Max = 17,742, Variance = 454,427.7

My initial approach is to model (continuous) rates using OLS. However, although its distribution is relatively normal, there is some variation in the tails:

As a result, the OLS residuals are far from perfect:

Question 1: What modeling approach would you recommend to address such variation? I realize quantile regression is one option (with its ups and downs), but perhaps there are other "standard" ways to model rates?

Question 2: Are there any reasons why I should use specifically rates, or instead counts for the purpose of analysis? Is there any general consensus on this?

Your feedback would be greatly appreciated.

Using county-level panel data (N = 2500+, T = 3) I aim to examine the associations between multiple biological, socioeconomic, and psychological factors and cancer incidence. There are two outcome measures available for me:

(1) cancer incidence rate, IR = (New cancers / Population) × 100,000

M = 197.19, SD = 51.12, Min = 0, Max = 610.6, Skewness = .1, Kurtosis = 7.25

(2) count of new occurrences

M = 241.4, SD = 674.11, Min = 0, Max = 17,742, Variance = 454,427.7

My initial approach is to model (continuous) rates using OLS. However, although its distribution is relatively normal, there is some variation in the tails:

As a result, the OLS residuals are far from perfect:

Question 1: What modeling approach would you recommend to address such variation? I realize quantile regression is one option (with its ups and downs), but perhaps there are other "standard" ways to model rates?

Question 2: Are there any reasons why I should use specifically rates, or instead counts for the purpose of analysis? Is there any general consensus on this?

Your feedback would be greatly appreciated.

Last edited: