multilevel model. Dependent variable is not truly interval

noetsi

Fortran must die
#1
I want to run multilevel regression. My dependent variable is a four point likert scale variable (as are predictors in most cases). Can you run multilevel models with a DV that is not classically interval (of course some argue likert scale is effectively interval, but for the moment I am not going there).

It would help if you can not readily do multilevel modeling with a 4 point DV to know what can analyze this. I am testing the result of area on satisfaction.
 

hlsmith

Less is more. Stay pure. Stay poor.
#2
I saw examples online for multilevel ordered logistic in stata and stan, but not one for sas yet.
 

hlsmith

Less is more. Stay pure. Stay poor.
#3
For MLMs I really like the "Multilevel Models: Applications Using SAS book by Wang et al. It covers MLM multinomial models on page 147. I always find it hard to believe, but over the years I still have not conducted a multinomial logistic reg model. I know SAS also has a MLM book they print, which may have been updated in the recent years. I would imagine that book should cover the topic as well.
 

noetsi

Fortran must die
#4
thanks hlsmith. If your data is ordered will it create problems using multinomial (I believe not from my knowledge of logistic regression, but I have been wrong before).

I will look for both books. If I run this and send the results with my comments would you tell me if I interpreted it correctly :p
 

hlsmith

Less is more. Stay pure. Stay poor.
#5
I can try to help the best I can. The ordering shouldn't be a big issue as long as you select the right reference group - I think.
 

noetsi

Fortran must die
#6
I was wondering today how serious a problem it would create if I treated a 4 point likert variable as interval and used interval multilevel. Probably one of those questions no one agrees on.
 

noetsi

Fortran must die
#7
Ok I have a different question. OLS regression is very common including among academicians. But MLM modeling argues essentially that if something can be nested inside something else then they will generate errors. That is the chance of getting a type 1 or type 2 error is much greater. My question is how you know when this is a problem or not. They do linear regression in my area (vocational rehabilitation) yet virtually all data is going to be nested (customers inside units inside areas).

How do you know if the linear method is valid in this case, as compared to having to use multilevel models. I am not sure statistical test are valid period in honesty because I usually have at least 60 percent of the data and often 100 percent.

To make things more confusing to me what I am interested in is the nesting inside areas. There are only 7 of these. Based on this comment I am not sure it is even valid to analyze this few groups with multilevel data.

"Guidelines for sample-size requirements and their implications for model complexity, the regression coefficients, variance components, and their standard errors are given in various studies and texts. For example, models with fewer than 20–25 groups may not provide accurate estimates of the regression coefficients and their standard errors, or of the variance components and their standard errors."

https://ies.ed.gov/ncee/edlabs/regions/northeast/pdf/REL_2015046.pdf

I do have a lower level called units, but no one cares about units and there are issues because some units overlap each other while being administrative separate.
 
Last edited:

noetsi

Fortran must die
#8
This is the type of comment that drives me crazy. It is in the context of multilevel models. To me it suggest that, ignoring issues of standard errors, you can hardly use OLS for issues that involving variables nested inside others.

"Traditionally, researchers tended to use model results at one level to draw statistical inference at another level [individual to group]. This has proven incorrect. The results from the two single level models frequently differ either in magnitude or in sign. The relationships found at the group level are not reliable predictors for relationships at the individual level. "

Individual variables are variables that operate at the individual level, group variables operate at a higher level like a school. So my question would be, ignoring wrong SE which can be dealt with by robust SE, can you run OLS with variables where some variables are nested inside others like person in school. I know this is done a lot - formally it violates independence. But does it seriously bias the results?

And if it does does this mean all OLS with a variable that can be placed in a hierarchy is wrong? :p

"
 

noetsi

Fortran must die
#9
An entirely different question to not create a new thread.

I want to run an ICC test of see if group matters at all. I have a group with only 7 levels which is not normally enough to apply multilevel analysis. There are certain procedures to correct for this, bootstrapping, but I am not sure you need to do this simply to run the ICC (which is an empty model).

Anyone know if ICC is valid with a very small number of groups, or do you have to transform the data first to run it?

And let me go another step. Say you have only 7 groups but you are very interested in differences between group on some variable (that is how some variable varies on groups). Should you just stick to linear regression on those variables? If you want to see how slopes vary by group on some variable (controlling for others) what is the best way to address this?
 
Last edited:

hlsmith

Less is more. Stay pure. Stay poor.
#10
Yes, I have heard people say you should have 40-70 groups to use MLM. I think there is an underlying degrees of freedom threat going on. But if your question can be best answered using them, I don't see an issue with running the empty model and seeing how much variability between group effects is explained. I am guessing there could be a power issue happening when there are small number of groups. Perhaps this is like when you have an important variable you need to control for, like a confounder and it isn't significant but you control for it anyways given its importance. I would imagine if the effects between groups was big enough the model would still be beneficial.

Did you get the Wang book?
 

noetsi

Fortran must die
#11
Yes, I have heard people say you should have 40-70 groups to use MLM. I think there is an underlying degrees of freedom threat going on. But if your question can be best answered using them, I don't see an issue with running the empty model and seeing how much variability between group effects is explained. I am guessing there could be a power issue happening when there are small number of groups. Perhaps this is like when you have an important variable you need to control for, like a confounder and it isn't significant but you control for it anyways given its importance. I would imagine if the effects between groups was big enough the model would still be beneficial.

Did you get the Wang book?
Yes. He is the one who pointed out the problem I raise here. I got from him the macro that deals with interval level MLM with small numbers of groups. I have yet to get the one that deals with categorical data (in honesty I decided just to use ML for our data when its interval].
 

noetsi

Fortran must die
#12
If you only have 7 groups, too few I know, will this impact the ICC test? I am far from sure. Normally I know bootstrapping is recommended for ML analysis with so few groups, but I am not sure if you even need this for just doing an ICC to see if groups matter.

The bootstrapping approaches I have require level 2 residuals. I am simply using an empty model to determine how much group matters as a very preliminary analysis. Not sure if I need to do bootstrapping to deal with the very small group size or not.

While I am at it, can you even do ICC if your DV is categorical (some of my DV are interval some categorical).
 
Last edited:

hlsmith

Less is more. Stay pure. Stay poor.
#13
I liken this in my head, but could be wrong, as the number of groups is like the sample size. Can you do regression with a sample size of 5, well yeah. But is it the best, probably not. But if the between groups variability is higher, it would be easier to find and power, so MLM is not completely futile. However, if the between group variability is small, you may not find a difference because it does not have a strong signal. The bootstrap likely comes into play like when you have data that isn't that Gaussian or small and you want symmetric and good coverage, so you use the bootstrap to get at the population standard deviation. You can calculate the ICC, but if you have these issues you cannot rule out chance, so the bootstrap helps you access if the ICC is beyond chance.
 

noetsi

Fortran must die
#14
It turns out that the bootstrapping macro requires proc iml we do not have. So it looks like we can not do ML analysis (which is annoying given the amount of time I spent learning and relearning it).
 

noetsi

Fortran must die
#15
I calculated the ICC. The dependent variable here is income 2 quarters after closure. It should be noted that while there are a wide range of values, half the total data has a value of 0 (people have no job).

This is the empty set code.

proc mixed data= sasuser.ta1 covtest noclprint;
class district_pri ;
model q2w = /solution;
random intercept /subject= district_pri;
run;

part of the results
Ml.PNG

My understanding is the ICC is 65073/ (65073 + 11619424) which is small, about a half of a percent. I don't know for sure if this small effect could be much larger if I bootstrapped or not (except for the macro which I can not run, I do not know how to bootstrap the data).
 

hlsmith

Less is more. Stay pure. Stay poor.
#16
Have you provided me this exact macro or a link to it? The bootstrap is likely just getting the SE (population SD estimate), So that wouldn't change the ICC value.

Seems like the smallest ICC value ever. What happens if you run a leave one out analyses. So run you model with all groups and no random effects. Now rerun it again, but each time drop one of the groups. So you will run it with all groups, then seven more times, each time just dropping a single other group. This sensitivity analyses will tell you how sensitive the results are base on any single group. If the effect estimates are comparable, you could plot them - then perhaps random effects are not overly relevant here.
 

noetsi

Fortran must die
#18
So in looking at spending by group it is obvious their substantive differences by group. But the ICC shows only a half of a percent. I am not sure how this works, they are inconsistent to me.

Nor am I sure how to work forward beyond simple descriptive.
 

hlsmith

Less is more. Stay pure. Stay poor.
#19
Did you model each group independently to see diffs in intercepts and slopes? I got the email, but up against a bunch of deadlines the next couple of weeks. That and deconstructing the macros matrix algebra wasnt as easy looking as i thought:)
 

noetsi

Fortran must die
#20
Did you model each group independently to see diffs in intercepts and slopes? I got the email, but up against a bunch of deadlines the next couple of weeks. That and deconstructing the macros matrix algebra wasnt as easy looking as i thought:)
I am glad you got it and thank you for looking at it. There is no hurry. I am not sure what you mean in the difference in intercepts and slopes. Did you mean, I missed this before, to run the model once for each area and compare their slopes and intercepts with linear regression rather than a multilevel approach?