Logistic regression with predictor variables with both categorical & actual values

#1
I've got a data set with 40 variables with 30 variables each having certain observations that are coded (i.e. have values) from 1-6 describing certain situations and some values that provide real values for what each of those variables represent as well. i.e. the variables are partially categorical. This issue relates only to independent/predictor variables.

I wish to run a logistic regression on the model- the dependent variable is binary.

Could someone please let me know how to proceed on this issue and what the actual code would be in generic terms? Below is what the code would look like somewhat -but your input would be greatly appreciated.


proc logistic data="c:\Documents\Dataset" descending;
class var1 var2 / param=ref ;
model good = var1 var2 var3 var4 .....var 40;
run;

Thank you !!!!!!
 

noetsi

No cake for spunky
#2
Re: Logistic regression with predictor variables with both categorical & actual value

I don't understand how a variable can be partially categorical. It either is categorical or it is not. You should create dummy variables generally out of categorical variables although I am not certain I understand what these predictors are.

The code you show is an older way of doing PROC LOGISTICS - generally you would not use the descending option anymore. Instead you specify the EVENT you are maxmising althoug the result is usually the same. There are vast numbers of options for PROC LOGISTICS. I show these without most of these. Note that this generates a wide range of graphs using ODS GRAPHICS which you may or may not want.


ODS GRAPHICS ON;

PROC LOGISTIC DATA=SASUSER.PRACTICE2
PLOTS(ONLY)=ALL
;
CLASS CLO (REF='HS') Area (REF="Area 2") / PARAM=REF;
MODEL DDV (Event = '1')=CLO Area Age DQ10 / STB


;

RUN;
ODS GRAPHICS OFF;

REF states what the reference level is for a variable rather than letting SAS chose a default. This avoids confusion with dummy variables. STB generates standardized coefficients which is useful to compare variables (for class variables this only shows up in the type III analysis shown).
 

Dason

Ambassador to the humans
#3
Re: Logistic regression with predictor variables with both categorical & actual value

I don't understand how a variable can be partially categorical. It either is categorical or it is not.
What do you consider censored data?
 

noetsi

No cake for spunky
#4
Re: Logistic regression with predictor variables with both categorical & actual value

I am not sure what you mean by censored data as I don't work with it. But if the data has a certain number of unique levels (say 12 or more) and it is ordered than it is commonly assumed that you can treat it as interval even though that does meet the formal requirements of such. Categorical data usually has a few levels, I have rarely seen a categorical variable having more than 7 distinct levels in SAS (well in research period).
 

Dason

Ambassador to the humans
#5
Re: Logistic regression with predictor variables with both categorical & actual value

Maybe you have the actual response but if the true response was larger than 100 the measurement itself will read 100. So your real variable is continuous but there is a category of ">= 100" that it could be also. Things aren't always quite as neat and clean as we would like them to be.
 
#6
Re: Logistic regression with predictor variables with both categorical & actual value

Thanks for replying noetsi. The variables have been some values that are coded (-1, -2, -3,-4, -5, -6) as these codes represent special cases where the individual's record was either not found, or was found with discrepancies and so on. Then the same variable also has actual values that is is meant to represent for other observations. So that is why the variables in question are partly categorical. Thanks for the code - what I need some more clarification on is what the REF statement is referring to as reference number/level ? Are you saying that by specifying that option, we tell SAS to model DDV's Event=1 against CLO's HS option and against Area's Area 2 option? Whats the reason for doing that? Another option that i saw was pprob=(0 to 1 by 0.1). Are you familiar with what this option is doing and why we use it? All i understand is that it is increasing the probability from 0 to 1 by 0.1 but for what and why?

Thanks again
 

noetsi

No cake for spunky
#7
Re: Logistic regression with predictor variables with both categorical & actual value

Why not simply treat the ones with the codes -1 to -6 as missing values (which is what they seem to be to me)? Unless you have very few cases using values such as this for missing values (which is what these appear to be) is not a great idea IMHO. If you want to deal with missing data it would be wiser to use multiple imputations (I think SAS does this although I forgot the PROC) than this. Or simply make the values missing.

REF tells SAS what is the level not included as a dummy for a categorical variable. This will then be the value you analyze by looking at the intercept. The reason for doing this is it commonly makes conceptual sense to compare certain levels of a categorical variable against all others. Say there are five levels of education and you believe one of these (college) has a specific effect. Then by using Ref= "college" you can directly compare the other levels to this. If you don't do this SAS will assign one of the levels - probably not the one you want- as the reference level.

I have never seen the pprob option in PROC LOGISTICS. I believe that SAS has an option that incrementally changes the level of a variable, but I am not familiar with it.