# Model checking and generalized multilinear regression

#### soriano

##### New Member
Dear all,

In my research I am analyzing the quality of game player softwares by making them play multiple times against a certain opponent and counting the number of victories. It seems to be very clear that the random variable of #victories should be modeled by a binomial distribution (as each game is either a victory or a defeat, and they play a fixed number of games).

So, I have two different software types and I change a certain numeric factor for both of them. My hypothesis is that by increasing this factor the #victories will increase for both software types, however it will increase faster for one than for another.

In order to test this hypothesis I am performing in R a generalized linear regression with many variables, assuming a binomial model. In R syntax, that would be: glm(formula = cbind(Win, Lose) ~ Size + Group + Size:Group, family = binomial), where Size is the numeric factor I was talking about and Group is a factor that defines the software type. I am looking at the coefficient of Size:Group to test if there is an interaction between these two factors (and therefore Size affects one Group type more than another). At first sight I get a good result, as the coefficient Size:Group is -0.023752, with p = 0.000164. However, R also prints the "Residual Deviance", and in that case it is 34.027 on 10 degrees of freedom. If I get the chi-square value, I find: 1-pchisq(34.027,10) = 0.0001827632. As far as I understand, this means that there is a statistical significant difference between each point and the prediction of the model. Therefore, seems that the binomial model is not adequate?.. This is very puzzling to me, because by the nature of the #victories variable, it should be a binomial...

If I just perform a simple multilinear regression over the proportions for each Size (in R: lm(formula = Win ~ Size + Group + Size:Group)), I still find a negative coefficient for Size:Group, but a much higher p: p = 0.079729. Testing the model itself, however, gives a much better result (Multiple R-squared: 0.8028, Adjusted R-squared: 0.7437, F-statistic: 13.57 on 3 and 10 DF, p-value: 0.0007372). I am very confused, because the best model should be a binomial... Also, this linear regression is only considering the averages, there is no information about how many samples for each average, etc... Moreover, p = 0.079 is a bit too high, and doesn't look like I could improve over that by running more samples, because the simple linear regression is actually not using any information about sample size.

I also ran linear regressions for each Group type, testing only the effect of Size on #victories. I tried both a simple linear regression (lm(formula = Win ~ Size)) and also a generalized linear regression (glm(formula = cbind(Win, Lose) ~ Size, family = binomial)). The graphs of the simple linear regression actually "look better" than the ones for the generalized linear regression (I mean, the lines are closer to the data points). In other parts of my paper draft I perform point-to-point comparisons of the proportions (for a fixed Size, I compare the proportions of the two different Group types, and for a fixed group, the proportion of two different values for Size). For these comparisons I am assuming a binomial model (I am using prop.test in R), due to the nature of the data. But maybe it is weird to assume a binomial model for these comparisons, but perform a simple linear regression instead of a generalized one?..

So, overall I am a bit lost about all this. Any help and suggestions are welcome!.. Thank you very much!..

Yours,
Leandro