Hi Everyone,
I am working on a logistic regression problem (injury [y/n] predicted by air temperature and physical training program). Problem is that only 1.2% of the entire 10000 sample had an injury, and this 1.2% is almost equally distributed between the 2 pt programs. I am investigating the raw data and temperature seems to be much more influential on injury compared to pt program. Besides the 2 pt programs both had 1.2% of individuals w/ an injury. When I run the log reg (2 factors - no interaction) it says that the pt program is actually more influential that temperature (pt OR = 1.9, temp OR = 1.4). Then if I add an interaction, the data is odd, the pt OR = 29811, but the interaction is slightly significant, although it is more in magnitude than directionally. I am just lost and don't know if I'm running in circles and i don't know if it is because hte proportion of 'yes' is just too small. ANY SUGGESTIONS???
I am working on a logistic regression problem (injury [y/n] predicted by air temperature and physical training program). Problem is that only 1.2% of the entire 10000 sample had an injury, and this 1.2% is almost equally distributed between the 2 pt programs. I am investigating the raw data and temperature seems to be much more influential on injury compared to pt program. Besides the 2 pt programs both had 1.2% of individuals w/ an injury. When I run the log reg (2 factors - no interaction) it says that the pt program is actually more influential that temperature (pt OR = 1.9, temp OR = 1.4). Then if I add an interaction, the data is odd, the pt OR = 29811, but the interaction is slightly significant, although it is more in magnitude than directionally. I am just lost and don't know if I'm running in circles and i don't know if it is because hte proportion of 'yes' is just too small. ANY SUGGESTIONS???