I'm new to Multilevel modeling and currently I been working on a business project and its data is related to multilevel modeling. I know a lot of things about how to approach this problem, but I will do my reasoning so you can tell me if it is ok or wrong. So I need a mind that know this type of things. Any suggestions will be very appreciated.
Data:
I can't show the data because it is confidencial, but I will explain you with an example.The data is collected by asking clients to score from 1 to 10 his experience about a certain business. The differents columns are:
Understanding data structure:
From now on I will assume for just example purpose, to assume a strict hierarchical structure.
(Strict hierarchical link from Client to Industry. Being Client->Characteristic->Business->Industry)
Questions that I want to respond:
Because of the questions above, I want to do a regression using Score as the response variable.
Analysing the behaviour of the response variable, it is counted data. I will use a Poisson response to fit the data better. So, because of this, I will use the lme4 package, for the glmer function.
So, to get answers for the questions above, I think I should use this code (I will treat Industry, Business and Characteristic as random because of the quantity of parameters):
fit <- glmer(score ~ (1|Industry/Business/Characteristic), family=poisson, data=mydata)
Which is the same as (I think):
fit <- glmer(score ~ (1|Industry) + (1| Industry:Business) + (1|Industry:Business:Characteristic), family=poisson, data=mydata)
Assuming correlation between Industry and Business and Industry, Business and Characteristic. But I know, that first I should check if there is a significant group factor involve in the data (by Industry, Business and Characteristic), by first using a simple linear regression and the compare that to the nullmodel (after this analysis I will use glmer and Poisson). In this case I used lm and glmer:
fit <- lm(score ~ 1, data = mydata)
Checking significant group factors:
nullmodel <- lmer(score ~ (1 | Industry), data = mydata)
anova(nullmodel,fit)
###refitting model(s) with ML (instead of REML)###
Data: mydata
Models:
fit: score ~ 1
nullmodel: score ~ (1 | Industry)
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
fit 2 293601 293619 -146798 293597
nullmodel 3 289522 289549 -144758 289516 4080.9 1 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
nullmodel <- lmer(score ~ (1 | Characteristic), data = mydata)
anova(nullmodel,fit)
###refitting model(s) with ML (instead of REML)###
Data: mydata
Models:
fit: score ~ 1
nullmodel: score ~ (1 | Characteristic)
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
fit 2 293601 293619 -146798 293597
nullmodel 3 291810 291837 -145902 291804 1793 1 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
nullmodel <- lmer(score ~ (1 | Business), data = mydata)
anova(nullmodel,fit)
###refitting model(s) with ML (instead of REML)##
Data: mydata
Models:
fit: score ~ 1
nullmodel: score ~ (1 | Business)
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
fit 2 293601 293619 -146798 293597
nullmodel 3 286396 286423 -143195 286390 7207 1 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Ok, with this responses I can conclude that there is a significant Industry, Business and Characteristic group factor, thus, I think I respond to my first 3 questions.
But here it gets tricky for me. I have just used linear analysis to start, but now I will consider the distribution of my response variable (Score) as Poisson. Then I compare the lineal model with the Poisson model with anova.
Results:
null.lineal <- lmer(score ~ (1 | industria), data = mydata)
null.poisson <- glmer(score ~ (1 | industria), family=poisson, data = mydata)
anova(null.lineal,null.poisson)
###refitting model(s) with ML (instead of REML)##
Data: mydata
Models:
null.poisson: score ~ (1 | industria)
null.lineal: score ~ (1 | industria)
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
null.poisson 2 302317 302335 -151157 302313
null.lineal 3 289522 289549 -144758 289516 12797 1 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
By this, I can conclude that a linear fit is better that assuming a Poisson distribution response. ¿So, I just stay with linear fit? The majority of this comparisons (between linear and Poisson) is better the Linear models.
So:
¿It is ok my reasoning? Let me know if I am doing something wrong or if it is right too please. One of my troubles is that I don't know if there is something I should consider.
¿What would it be your approach to answer the above questions? ¿Any suggestions?
Data:
I can't show the data because it is confidencial, but I will explain you with an example.The data is collected by asking clients to score from 1 to 10 his experience about a certain business. The differents columns are:
- Client: Client id.
- Characteristic: A characteristic such "Price" for example. There are 11 characteristics. (All Industries and Businesses have the same characteristics)
- Business: The business which the client belongs to. There are 49 businesses.
- Industry: The Industry which the client belongs to. There are 13 industries.
- Score: Score from 1 to 10 of the experience
- X1,X2,...,Xn: Different variables at the individual level such as age,gender,etc.

Understanding data structure:
- An Industry can have multiple Business.
- A business has only one Industry.
- A Business can han multiple Characteristics.
- Multiple Characteristics can be said by a Client.


From now on I will assume for just example purpose, to assume a strict hierarchical structure.
(Strict hierarchical link from Client to Industry. Being Client->Characteristic->Business->Industry)
Questions that I want to respond:
- Is there a significant difference of Score by Industry?
- Is there a significant difference of Score by Business?
- Is there a significant difference of Score by Characteristic?
- How much does the characteristics in each of the industries contribute to the score?
- Which characteristic are better rated on average?
- How much does the characteristics in each of the business contribute to the note?
- Which industries can be considered equal to the country's average?
- What business can be considered equal to the Industry's average?
- How much does the gender of the person contribute by grade to industry?
Because of the questions above, I want to do a regression using Score as the response variable.

Analysing the behaviour of the response variable, it is counted data. I will use a Poisson response to fit the data better. So, because of this, I will use the lme4 package, for the glmer function.
So, to get answers for the questions above, I think I should use this code (I will treat Industry, Business and Characteristic as random because of the quantity of parameters):
fit <- glmer(score ~ (1|Industry/Business/Characteristic), family=poisson, data=mydata)
Which is the same as (I think):
fit <- glmer(score ~ (1|Industry) + (1| Industry:Business) + (1|Industry:Business:Characteristic), family=poisson, data=mydata)
Assuming correlation between Industry and Business and Industry, Business and Characteristic. But I know, that first I should check if there is a significant group factor involve in the data (by Industry, Business and Characteristic), by first using a simple linear regression and the compare that to the nullmodel (after this analysis I will use glmer and Poisson). In this case I used lm and glmer:
fit <- lm(score ~ 1, data = mydata)
Checking significant group factors:
nullmodel <- lmer(score ~ (1 | Industry), data = mydata)
anova(nullmodel,fit)
###refitting model(s) with ML (instead of REML)###
Data: mydata
Models:
fit: score ~ 1
nullmodel: score ~ (1 | Industry)
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
fit 2 293601 293619 -146798 293597
nullmodel 3 289522 289549 -144758 289516 4080.9 1 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
nullmodel <- lmer(score ~ (1 | Characteristic), data = mydata)
anova(nullmodel,fit)
###refitting model(s) with ML (instead of REML)###
Data: mydata
Models:
fit: score ~ 1
nullmodel: score ~ (1 | Characteristic)
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
fit 2 293601 293619 -146798 293597
nullmodel 3 291810 291837 -145902 291804 1793 1 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
nullmodel <- lmer(score ~ (1 | Business), data = mydata)
anova(nullmodel,fit)
###refitting model(s) with ML (instead of REML)##
Data: mydata
Models:
fit: score ~ 1
nullmodel: score ~ (1 | Business)
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
fit 2 293601 293619 -146798 293597
nullmodel 3 286396 286423 -143195 286390 7207 1 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Ok, with this responses I can conclude that there is a significant Industry, Business and Characteristic group factor, thus, I think I respond to my first 3 questions.
But here it gets tricky for me. I have just used linear analysis to start, but now I will consider the distribution of my response variable (Score) as Poisson. Then I compare the lineal model with the Poisson model with anova.
Results:
null.lineal <- lmer(score ~ (1 | industria), data = mydata)
null.poisson <- glmer(score ~ (1 | industria), family=poisson, data = mydata)
anova(null.lineal,null.poisson)
###refitting model(s) with ML (instead of REML)##
Data: mydata
Models:
null.poisson: score ~ (1 | industria)
null.lineal: score ~ (1 | industria)
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
null.poisson 2 302317 302335 -151157 302313
null.lineal 3 289522 289549 -144758 289516 12797 1 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
By this, I can conclude that a linear fit is better that assuming a Poisson distribution response. ¿So, I just stay with linear fit? The majority of this comparisons (between linear and Poisson) is better the Linear models.
So:
¿It is ok my reasoning? Let me know if I am doing something wrong or if it is right too please. One of my troubles is that I don't know if there is something I should consider.
¿What would it be your approach to answer the above questions? ¿Any suggestions?