Help building a model in SAS

I need to build a model in SAS to look at how certain scores (1 continuous variable) vary by obesity (yes-1, no-0), HTN (yes-1, no-0), and/or DMII (yes-1, no-0).

I have built the data set but I am having a hard time thinking through how to build this model. Can anyone help?


Fortran must die
If you tell me the method I can suggest the PROC. There is not ever just one way to do anything in SAS. :)

There are a lot of methods that could analyze the data and theory you are suggesting. Its hard to pick a proc until you provide more details.
@noetsi @fed2 Well I am not sure what details you need, but here is what I'll go with. If you specify what else you need, I am happy to provide my best answer.

I want to look at how skin carotenoid scores vary by obesity, hypertension, and diabetes status in a cohort of roughly 120 people. As I mentioned, individuals are coded for BMI, HTN, and DMII as 1 for yes and 0 for no. I will need to look at all main effects as well as interactions as we have people with all combinations of diagnoses (and quite a few with none). I hypothesize that the carotenoid scores will be higher in individuals with obesity alone, and lower in individuals with hypertension or diabetes present. Overall, I want to look at relationships between the carotenoid scores (a measure of deposition of antioxidant-behaving phytonutrients) in individuals with high fat stores (they are fat-soluble, so deposition could be greater) but lower in individuals with inflammatory disease states such as HTN and DMII. what else?


Fortran must die
How is the dependent variable coded. Either your have skin carotenoid or not (1 or 0). If so logistic regression is probably best although economist like linear probability models instead. It all depends on how this is coded.

If it is Logistic Regression than PROC LOGISTICS is what you want to do.
@noetsi The skin carotenoid score is a value between 0-1000, so each individual. We hypothesize higher values for individuals with no inflammatory diagnoses (HTN/DMII) and higher for individuals with obesity without HTN/DMII and we hypothesize lower values for individuals with either HTN or DMII or combination.


Fortran must die
If that is not an artificial scale, so that 500 is greater than 499 and 501 is greater than 500 on the skin carotenoid dimension, then you can probably use linear regression or ANCOVA if the other assumptions are met (that is the formal regression assumptions). Formally its not interval with a start or end point, but in practice I don't think that will matter unless you get a lot of values near 0 and 1000 (which is a problem for any model).
@noetsi Yes it is NOT an artificial scale and we do not get many values near 0 and none near 1000. Can you help me with SAS code for running this?

This is my input code with variable names

INFILE 'C:\Users\Jessica\Desktop\Manuscripts Abstracts Research\VM_ob_HTN_DM2\data set VM_OB_HTN_DM2.txt';

Where VM is skin carotenoid content, OB is obesity (0 or 1), HTN is hypertension (0 or 1) and DM is diabetes (0 or 1)


Fortran must die
I use enterprise guide so I rarely use data steps. Instead I use proc sql and import statements. If you are going to do statistics I strongly suggest using enterprise guide which you should have if you have regular SAS. You will find the gui much much easier to use than just writing code from scratch.

You have not, as far as I can tell, decided if you are going to run ANOVA or regression (linear regression I think since you have an interval dependent variable). I can not help with ANOVA, only regression. That would be PROC REG in this case or possibly PROC GLM. I can pull down code for that tonight, still working on my work code right now. PROC REG is much older and does not have a class statement which is useful in this case. That is a reason to use PROC GLM (which I know much less well). But if you use proc glm you will need to do the diagnostics in proc reg since many of them do not exist in PROC GLM (sas is lazy and not stressing stats these days anyhow I personally believe).

We have an agency we report to that continually makes doubtful statistical calls. Which we can complain about and then have to do.
@noetsi does that mean you cannot help me? I don't know how to use enterprise. I clearly am a beginner at SAS. I just need a statement to run the code and the SAS support center and everything I find online is written in such a way that assumes coding knowledge which I don't have. I simply need to know pvalues of whether there is a significant difference between skin carotenoid scores among people with obesity, hypertension, diabetes or some combination. I don't know how such a simple question can be so difficult. STATS, ARG!


Fortran must die
I can't help with the data or import steps. I can send you code for linear regression once you do import the data if you are doing linear regression.


Fortran must die
There are thousands of options for proc reg. This is base code
You have to have a file called fitness (or whatever you choose to call your data file)

proc reg data=fitness PLOTS=PARTIAL DFBETAS DFFITS;
model Oxygen = Runtime Weight Age /partial;
output out=diag;
ODS Graphics OFF;

The dependent variable is the left of the equal sign here it is Oxygen. There is no CLASS statement, PROC REG is old code.