# Proc HPREG class statement

#### noetsi

##### Fortran must die
proc hpreg data=WORK.REG;
CLASS '2FL'n '10FL'n / PARAM=REFERENCE REF=Last;

So if you have 0 and 1 is 0 last or 1.

It appears that this only allows last or first. You can not set ref=1 or 0. I am not sure what ordered last or first means.

Last edited:

#### noetsi

##### Fortran must die
I tried this in proc genmode and failed there to.

Code:
CLASS "10FL"n (REF =1) "2FL"n (ref=1)
;
generates an error. It says it will take only last or first. I thought you could set reference coding to any valid level of the variables.

#### hlsmith

##### Less is more. Stay pure. Stay poor.
If you dont know which one it is using just look at the output, it will tell you. I am pretty sure you can set ref in genmod or at least in its estimate statement!

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Example of a piece of code I am using right now for a logistic, I think you just change dist to normal and link to identity or gaussian and you should be good to go:

Code:
proc genmod data = LWBS.full descending ;
class
RaceCat (ref='white')
CCcat (ref='Abdom')
HoursCat (ref='OpWeekdayHours');
model LWBS =
age_v
racecat
CCcat
HoursCat
ED_Accuity_V
Pulse_V
Respirations_V
Pulse_Ox_V
Pain_Score_V
estimate 'Age' age_v 10/ alpha=0.01 exp;
estimate 'Asian' RaceCat 1 0 0 0 0 -1 / alpha=0.01 exp;
run;

Last edited:

#### noetsi

##### Fortran must die
It looks to me like it just takes first and last but maybe that is because I am trying to do it in the global class statement. If ref=1 (a number)
do you do (ref=1) or (ref='1')

#### noetsi

##### Fortran must die
when I run this code

Code:
PROC GENMOD DATA=WORK.SORTTempTableSorted
PLOTS(ONLY)=ALL
;
CLASS "10FL"n (REF =1) "2FL"n (ref=1)
;
MODEL Q2Wage="2FL"n "10FL"n
/
;
I get this error

Code:
ERROR 22-322: Syntax error, expecting one of the following: a quoted string, FIRST, LAST.
ERROR 200-322: The symbol is not recognized and will be ignored.
It turns out you have to do this which makes no sense since it is a number field not a string

Code:
PROC GENMOD DATA=WORK.SORTTempTableSorted
PLOTS(ONLY)=ALL
;
CLASS "10FL"n (ref ='1') "2FL"n (ref='1')
;
MODEL Q2Wage="2FL"n "10FL"n
/
;
Not having used genmode before I found the output confusing. Is this what it should look like?

Parameter DF Estimate Error Limits Chi-Square Pr > ChiSq
Intercept 1 1686.360 26.9170 1633.603 1739.116 3925.05 <.0001
2FL 0 1 239.0933 57.7919 125.8233 352.3633 17.12 <.0001
2FL 1 0 0.0000 0.0000 0.0000 0.0000 . .
10FL 0 1 311.3500 51.7718 209.8790 412.8210 36.17 <.0001
10FL 1 0 0.0000 0.0000 0.0000 0.0000 . .
Scale 1 3189.663 15.6405 3159.155 3220.465

So for example for variable 2FL the non-reference dummy has a mean 239 larger

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Generalized models typically use a version of maximum likelihood and allow you to state the distribution and link function. So I can use them to move between ORs, RRs, and RDs. There can be convergence or appropriateness issues in the latter groups, but the log will tell you this.

Without seeing the rest of the output, it seems 2FL=1 has an expected mean 1686 and 2FL=0 has an expected 239 mean increase while controlling for 10FL.

What is up with the "n used with the variable name? I haven't seen that before.

#### noetsi

##### Fortran must die
I don't understand that hlsmith. The code makes the reference level 1 so shouldn't level 0 be equal to the intercept not 1. I thought that is how reference levels worked.

But I think what you wrote is correct based on other data I ran. It just seems strange to me for dummy variables to be done that way.

#### hlsmith

##### Less is more. Stay pure. Stay poor.
In a model with 1 binary predictor the reference level is the intercept and the estimate is for the other level. Though, you get to set which level is the reference, so it can be whichever you select!

#### noetsi

##### Fortran must die
To be clear I have data that takes on values of 0 and 1 in the data. This is a portion of the proc genmode results.

2FL 0 239.0933
2FL 1 0.0000

Does this mean for this variable that those who take on a level 0 in the raw data are on average (and controlling for other variables) 239 higher than level 1. It is the mean difference between levels I am concerned with.

Last edited:

#### hlsmith

##### Less is more. Stay pure. Stay poor.
You didn't provide all of your code and output, but it should - yes.

#### noetsi

##### Fortran must die
I remain confused about the coding here.

This is the print out

I know that whites are 1 in the source data and non-whites 0.

So the results are

So does this mean that the 0 (non whites) are -533 dollars less than whites controlling for other variables in the model?

I am using reference not glm coding

CLASS 'Limited English-language profici'n 'Migrant and seasonal farmworker'n 'Race: Hawaiian/Pacific Islander'n 'Race: White'n 'Race: Black'n 'Psychosocial and psychological d'n 'Race: Asian'n 'Physical disability'n 'Postsecondary education no degre'n 'Low-income'n 'Long-term unemployed'n 'Age 16 to 18'n 'Intellectual and learning disabi'n 'Age 19 to 24'n 'Displaced homemaker'n 'High school diploma or equivalen'n 'Individuals has a significant di'n 'Individuals is most significant'n 'TANF recipient'n 'Special education certicate/comp'n 'Received public support at appli'n 'Single parent'n 'Received training services'n 'Received other services'n 'Received career services'n 'Homeless individual, runaway you'n 'Age 25 to 44'n 'Race: More than one'n 'Foster care youth'n Female 'Ethnicity-Hispanic Ethnicity'n 'Employed at application'n 'Age 45 to 54'n 'Age 55 to 59'n 'Age 60+'n 'Associate’s degree'n 'Auditory and communicative disab'n Veteran 'Bachelor’s degree'n 'Beyond a bachelor’s degree'n / PARAM=REFERENCE REF=Last;
MODEL Qtr2_Wage = 'Age 16 to 18'n 'Age 19 to 24'n 'Age 25 to 44'n 'Age 45 to 54'n 'Age 55 to 59'n 'Age 60+'n 'Associate’

#### noetsi

##### Fortran must die
What does this mean (not sure what they mean by singularity here). This is for HPREG

• The GLM encoding estimates the difference in the effect of each level compared to the reference level. You can use the REF= option to specify the reference level. By default, the reference level is the last ordered level. The design matrix for the GLM encoding is singular.
• The REFERENCE encoding estimates the difference in the effect of each nonreference level compared to the effect of the reference level. You can use the REF= option to specify the reference level. By default, the reference level is the last ordered level. Notice that the REFERENCE encoding gives the same interpretation as the GLM encoding. The difference is that the design matrix for the REFERENCE encoding excludes the column for the reference level, so the design matrix for the REFERENCE encoding is (usually) nonsingular.

#### hlsmith

##### Less is more. Stay pure. Stay poor.
I remain confused about the coding here.

This is the print out

View attachment 3099

I know that whites are 1 in the source data and non-whites 0.

So the results are

View attachment 3100

So does this mean that the 0 (non whites) are -533 dollars less than whites controlling for other variables in the model?

I am using reference not glm coding

CLASS 'Limited English-language profici'n 'Migrant and seasonal farmworker'n 'Race: Hawaiian/Pacific Islander'n 'Race: White'n 'Race: Black'n 'Psychosocial and psychological d'n 'Race: Asian'n 'Physical disability'n 'Postsecondary education no degre'n 'Low-income'n 'Long-term unemployed'n 'Age 16 to 18'n 'Intellectual and learning disabi'n 'Age 19 to 24'n 'Displaced homemaker'n 'High school diploma or equivalen'n 'Individuals has a significant di'n 'Individuals is most significant'n 'TANF recipient'n 'Special education certicate/comp'n 'Received public support at appli'n 'Single parent'n 'Received training services'n 'Received other services'n 'Received career services'n 'Homeless individual, runaway you'n 'Age 25 to 44'n 'Race: More than one'n 'Foster care youth'n Female 'Ethnicity-Hispanic Ethnicity'n 'Employed at application'n 'Age 45 to 54'n 'Age 55 to 59'n 'Age 60+'n 'Associate’s degree'n 'Auditory and communicative disab'n Veteran 'Bachelor’s degree'n 'Beyond a bachelor’s degree'n / PARAM=REFERENCE REF=Last;
MODEL Qtr2_Wage = 'Age 16 to 18'n 'Age 19 to 24'n 'Age 25 to 44'n 'Age 45 to 54'n 'Age 55 to 59'n 'Age 60+'n 'Associate’
But if you have multiple class variables the intercept will be the group of all reference levels. So your interpretation is likely right.

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Do you know what a design matrix is?

#### noetsi

##### Fortran must die
But if you have multiple class variables the intercept will be the group of all reference levels. So your interpretation is likely right.
Is it correct, to be sure that the slope they are showing is the difference between the 0 and 1. So if the slope is positive 0 will be larger than 1?

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Yes, given the information you provided.

#### noetsi

##### Fortran must die
Thanks. Not used to proc glm. I am starting to use it based on your comments. Always used proc reg before which of course has no class statement.

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Yeah, I have really liked GLMs lately. I hadn't used them before really for linear or logistic - count data .