Fixed vs. Random Effects

I'm wondering if someone could help me to correctly identify whether certain variables should be considered fixed or random effects.

Just as context, I have a dataset of people (with information about them that I know is FIXED).

I have a variable indicating to which healthcare practice they belong and visited (I know this to be RANDOM).

However, my question concerns information about the healthcare practice (things like the number physicians working there, the size of the practice....). These are things I can directly measure which makes me believe they should be considered FIXED but I'm not sure if since they are information about a RANDOM effect if that means they need to be considered random as well? Probably a dumb questions, but just wanted to get some feedback.


Less is more. Stay pure. Stay poor.
Perhaps it may help to use terms individual or group level effects. Your fixed effects are at the patient level, they are patient characteristics. The hospital characteristics are at the group level. Patients are nested in hospitals (the groups).

Another concept you need to address is random intercepts or not?
Let me provide a bit more context......

A bunch of healthcare practices received a survey that some colleagues and I created. This survey asks a series of questions where based on their responses each practice will receive a single score (i.e. each question has 4 choices, A=1 point, B=2 points... and we just add up the points). Aside from this survey score, we will know some practice info such as number of physicians/nurses who work there, size of the practice....).

Each practice has patients who belong to it and we will know lots of details about each patient (their prior 12 month cost, the number of inpatient admissions, presence of various diseases, age, sex.....).

The main question of interest is to look at the effect of that survey score on patient costs (the thought being that practices with higher scores will ultimately have lower patient costs). However, I obviously need to control for all the differences between practices and patients.

I've been reading about Hierarchical Linear Models (HLM) and at first glance this seems like it would help answer that questions as (mentioned by hlsmith), patients are nested in practices - for my purpose I'm only including patients who belonged to a single practice so I'm not concerned with those who switch from one practice to another.

I should have 100,000 patients belonging to 200 or so different practices. Every member will also be followed for the same date range.

For simplicity sake, I'll refer to the following:

PATIENT_DATA = all of the fixed effect variables such as patient's prior cost, prior utilization, demographic info....

PRACTICE_DATA = all of the random effect variables such as the unique practice identifier, the number of staff at a particular practice, the survey score of a practice....

Y = the variable of interest which in this case the member's 12 month post costs (this is what is thought to be lowered by a practice having a higher survey score).

I think (heavy emphasis on think) I would want my SAS code to look something like:

proc mixed data=data;
model Y = patient_data practice_data / ddfm=bw residual;
random intercept  / subject=practice_id type=un;
I'm actually not totally sure if my suggested code above is correct or not given all the parameters (for example, is it appropriate for the RANDOM EFFECTS variables, practice_data, to be in the MODEL statement or does it need to in the RANDOM statement?)
Last edited:


Less is more. Stay pure. Stay poor.
Will there just be single observation data for patients for the given year, so there won't be multiple visit data clustered in patients?

The first model you should run is an empty model with no predictors to see if group levels explain the outcome by themselves.

Yes, the group level effects should also be in the random statement line. Given your sample size, models will run faster if you sort by the group variable before running the model.
Yes, there will be just 1 row per patient.

In all of the literature I've been reading I have seen this empty model you mentioned so I will definitely be sure to run that first (as well as doing lots of normal descriptive stuff to just better understand the data). I also came across an article that mentioned sorting the data to make it run faster so that's nice to have a second mention of that!

I will make sure to add the group level effects within that RANDOM statement (I assume RANDOM intercept practice_data_var1 partice_data_var2..... / subject=practice_id type=un will work fine for this)

Thank you for all of your assistance!
Last edited:


No cake for spunky
Fixed and random is not quite as simple as it seems. The same data can be either fixed or random depending on your interpretation of it. For example I have a data set that consists of all our members. If I am only concerned about our members it is fixed (ignoring the issue of whether it can be part of a structured hierarchy). It however, I see it as a random sample of possible customers (for example customers over time, or people who have similar conditions state wide including those who are not now our customers) it can be considered random.
Sure, that makes sense. In my case I'm just keeping the same members over the same time period so they are fixed.
Last edited: