# Categorical multiple linear regression analysis - what would you do?

#### gliomanerd

##### New Member
Noob here. I'm doing a project to determine how different factors (race, insurance status) are related to distance from pediatric urologists.

I'm thinking of tabulating the mean of each of these factors for several successive distance brackets, i.e. 0-10 miles, 10-20 miles etc. So distance isn't really continuous even though this is my dependent variable. Would you use multiple linear regression for this? Could I simply assign a number to each bracket, and interpret it that way, i.e. 0-10 miles = 1, 10-20 miles = 2.

Does it make sense to test whether these factors are predictive of distance? I'm simply trying to see if there are any correlations between distance and these factors, and so far MLR seems to be the way to go.

#### kiton

##### Member
Regression analysis allows you to establish causality between variables, whereas correlation only indicates whether variables related or not. What is your goal?

With distance "as is" you are looking at a linear regression. However, if you transform it into "brackets" your DV would be on the ordinal scale. Therefore, you will be looking at a completely different model (e.g., ordered logistic regression). Now, when you say MLR -- do you imply multinomial logistic regression? If so, then it deals with categorical predictors, not ordinal.

Why don't you run an ANOVA with a continuous DV and two categorical predictors?

#### gliomanerd

##### New Member
Thanks for your help. I want to see if race, insurance status etc is predictive of distance from surgeons. So I guess that means doing a regression analysis. I think what you said, ordered logistic regression is the best way for me to go. (What do you think? Does it make more sense to determine correlation, as opposed to predictive value?)

I can't do a continuous distance because my data is not that fine. I am relying on zip code tabulation areas, and they can vary in size.

I know this is a stupid question, but can my predictors themselves be means? For example, the dependent variable is distance. I am calculating the average race of all ZCTAs in each distance bracket. This average has it's own mean and SD. Do I just disregard this SD when conducting my ordered logistic regression?

#### kiton

##### Member
Thanks for your help. I want to see if race, insurance status etc is predictive of distance from surgeons. So I guess that means doing a regression analysis. I think what you said, ordered logistic regression is the best way for me to go. (What do you think? Does it make more sense to determine correlation, as opposed to predictive value?)
It is not one or another one, as typically researchers look at both. Correlation indicates you variables are related or not, whereas regression establishes what impact one has on another.

I can't do a continuous distance because my data is not that fine. I am relying on zip code tabulation areas, and they can vary in size.
Can you please clarify "data is not that fine" -- what bothers you exactly?

I know this is a stupid question, but can my predictors themselves be means? For example, the dependent variable is distance. I am calculating the average race of all ZCTAs in each distance bracket. This average has it's own mean and SD. Do I just disregard this SD when conducting my ordered logistic regression?
Is there a specific reason to go this route? Based on your goal, ANOVA seems appropriate.

#### gliomanerd

##### New Member
Thanks for your help. After thinking about it, I think correlation would be better for me. I went ahead and calculated the distance of each ZCTA from the nearest surgeon. That way my response variable is continuous.

I am trying to use ANOVA General linear model on Minitab, and it shows that my p value is 0.000 for basically every variable. I see a high number of "Lack-of-Fit" and "Pure error" entries. Why is this?