Replication in logistic regression


New Member
Hello. I'm planning a seed germination experiment where n total seeds are evenly distributed across i replicates of j treatments (so that n equals i*j*number of seeds per replicate). I will compare the proportion of germinated seeds among treatments using a logistic regression model.

My query is, given a fixed total number of seeds (n) ¿Does the distribution of seeds across replicates affect the power of a logistic regression model? Say ¿is it the same to have 100 seeds distributed in 10 replicates (n=1000) as to having 10 seeds distributed in 100 replicates?

My confusion arises because, as far as I know, the Chi2 statistic gets its power from the total number of elements counted (n?).

I would certainly appreciate any comment on this.


TS Contributor
Hi rolo,

Although the n is the same, a higher number of replicates will increase the power of the experiment. Also, I assume the seed type will be considered as an independent variable in the model, so it would be a good idea not to have too many of those, in favor of a higher number of replicates. Just consider that by having 100 seeds, you would need 99 dummy variables to analyze it.

I don’t read Rolo as Terzi does. But I agree with Terzi that the more categories the more difficult and less power to compare them. If you have 10 categories then it will be many pair to compare (10*9/2). If you have 20 categories it will be (20*19/2) pairs. But even if we ignore the number of pairs and just compares category F with G, then increasing the number of categories will decrease the number of seeds in F and G. Thus with a lower power.

No, I read Rolo as that there is a quantitative variable, such as moisture. Lets call that x_j. So that x_j is the amount moisture at treatment j. Where j=1,….,J. So, how large should J be? How many levels?

If Rolo is absolutely sure that there is a linear relationship the best thing is to stretch out x-variable as far as possible. So half of the seed very high up on x and the other half very low.

But we cant be sure that the model is true and that we stretch out to far. And we can’t be sure that the model is linear. With moisture there is probably a curved relationship, increasing first and decreasing later.

I would maybe take 10 to 20 levels. Maybe there are practical difficulties in having to many levels, so I don’t want to many. Maybe I would put a larger fraction out in the upper and lower part and keep a smaller fraction in the middle to be able to check linearity and curvature.

With 5 to 10 seed per level Rolo can easily plot the observed fraction against x and later estimated logit versus x and thereby check linearity.

Say is it the same to have 100 seeds distributed in 10 replicates (n=1000) as to having 10 seeds distributed in 100 replicates?
If they were placed at around the same x-values then I would say that it is about the same.

[Rollo was the first Viking chief in Normandy, who’s descendent William the Conqueror conquered England and others conquered Sicily. Interesting name!]