**Data**:I can't show the data because it is confidencial, but I will explain you with an example.The data is collected by asking clients to score from 1 to 10 his experience about a certain business. The differents columns are:

- Client: Client id.
- Characteristic: A characteristic such "Price" for example. There are 11 characteristics. (All Industries and Businesses have the same characteristics)
- Business: The business which the client belongs to. There are 49 businesses.
- Industry: The Industry which the client belongs to. There are 13 industries.
- Score: Score from 1 to 10 of the experience
- X1,X2,...,Xn: Different variables at the individual level such as age,gender,etc.

**Understanding data structure:**

- An Industry can have multiple Business.
- A business has only one Industry.
- A Business can han multiple Characteristics.
- Multiple Characteristics can be said by a Client.

From now on I will assume for just example purpose, to assume a strict hierarchical structure.

(Strict hierarchical link from Client to Industry. Being Client->Characteristic->Business->Industry)

**Questions that I want to respond:**- Is there a significant difference of Score by Industry?
- Is there a significant difference of Score by Business?
- Is there a significant difference of Score by Characteristic?
- How much does the characteristics in each of the industries contribute to the score?
- Which characteristic are better rated on average?
- How much does the characteristics in each of the business contribute to the note?
- Which industries can be considered equal to the country's average?
- What business can be considered equal to the Industry's average?
- How much does the gender of the person contribute by grade to industry?

**Approach:**Because of the questions above, I want to do a regression using

**Score**as the response variable.

Analysing the behaviour of the response variable, it is counted data. I will use a Poisson response to fit the data better. So, because of this, I will use the lme4 package, for the glmer function.

So, to get answers for the questions above, I think I should use this code (I will treat Industry, Business and Characteristic as random because of the quantity of parameters):

fit <- glmer(score ~ (1|Industry/Business/Characteristic), family=poisson, data=mydata)

Which is the same as (I think):

fit <- glmer(score ~ (1|Industry) + (1| Industry:Business) + (1|Industry:Business:Characteristic), family=poisson, data=mydata)

Assuming correlation between Industry and Business and Industry, Business and Characteristic. But I know, that first I should check if there is a significant group factor involve in the data (by Industry, Business and Characteristic), by first using a simple linear regression and the compare that to the nullmodel (after this analysis I will use glmer and Poisson). In this case I used lm and glmer:

fit <- lm(score ~ 1, data = mydata)

**Checking significant group factors:**

nullmodel <- lmer(score ~ (1 | Industry), data = mydata)

anova(nullmodel,fit)

###refitting model(s) with ML (instead of REML)###

Data: mydata

Models:

fit: score ~ 1

nullmodel: score ~ (1 | Industry)

Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)

fit 2 293601 293619 -146798 293597

nullmodel 3 289522 289549 -144758 289516 4080.9 1 < 2.2e-16 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

nullmodel <- lmer(score ~ (1 | Characteristic), data = mydata)

anova(nullmodel,fit)

###refitting model(s) with ML (instead of REML)###

Data: mydata

Models:

fit: score ~ 1

nullmodel: score ~ (1 | Characteristic)

Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)

fit 2 293601 293619 -146798 293597

nullmodel 3 291810 291837 -145902 291804 1793 1 < 2.2e-16 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

nullmodel <- lmer(score ~ (1 | Business), data = mydata)

anova(nullmodel,fit)

###refitting model(s) with ML (instead of REML)##

Data: mydata

Models:

fit: score ~ 1

nullmodel: score ~ (1 | Business)

Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)

fit 2 293601 293619 -146798 293597

nullmodel 3 286396 286423 -143195 286390 7207 1 < 2.2e-16 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Ok, with this responses I can conclude that there is a significant Industry, Business and Characteristic group factor, thus, I think I respond to my first 3 questions.

But here it gets tricky for me. I have just used linear analysis to start, but now I will consider the distribution of my response variable (Score) as Poisson. Then I compare the lineal model with the Poisson model with anova.

Results:

null.lineal <- lmer(score ~ (1 | industria), data = mydata)

null.poisson <- glmer(score ~ (1 | industria), family=poisson, data = mydata)

anova(null.lineal,null.poisson)

###refitting model(s) with ML (instead of REML)##

Data: mydata

Models:

null.poisson: score ~ (1 | industria)

null.lineal: score ~ (1 | industria)

Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)

null.poisson 2 302317 302335 -151157 302313

null.lineal 3 289522 289549 -144758 289516 12797 1 < 2.2e-16 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

By this, I can conclude that a linear fit is better that assuming a Poisson distribution response. ¿So, I just stay with linear fit? The majority of this comparisons (between linear and Poisson) is better the Linear models.

**So:**

¿It is ok my reasoning? Let me know if I am doing something wrong or if it is right too please. One of my troubles is that I don't know if there is something I should consider.

¿What would it be your approach to answer the above questions? ¿Any suggestions?