assumption of independence

Why is it important that cases are independent in regression (especially, logistic regression)?

By "independent" i mean that observations under one condition are not linked in any way to observations under another condition, as in a repeated-measures design where multiple data points represent the same subject.


Ninja say what!?!
Most regression's people use are parametric. That implies that they are based on some probability distribution, such as the gaussian or normal. To follow the distributions, the obs's must be independent.

Another way to think about it is though casual inference. Lets use logistic regression. People tend to say (i.e.when the OR= say 2) that the odds of the event increases by a factor of two if the independent variable is met. If your data were not independent and not controlled for that dependence, then the estimate of 2 would be off. Thus, you would be misleading people with that claim.
thanks a lot! :)

but i'm still wondering why parametric statistics rests on this assumption. Suppose i have two data sets with the same general distributions, one generated from dependent measures and the other from independent measures. What is special about the former that prohibits it's use with logistic regression. I'm still interested in knowing how the odds of the event occurring change from one condition to another.

follow-up question: How does one go about controlling for dependence? Is it a similar issue to multicolinearity?


Ambassador to the humans
Independence is important because we can get biased results otherwise.

Imagine you have a population of mice and you know that for the entire population the average amount they eat per "meal" is 30 Kirbs (made up unit...) but you also know that the amount they eat is similar for mice that come from the same litter. If you know that one mouse eats 120 kirbs during a meal and you want to estimate how much his brother eats.... Using the estimate of 30 kirbs from the general population seems like it might not be a good estimate. We'd probably want an estimate that's at least larger than 30 but maybe not as large as 120.

When all the observations are independent we don't have to worry about the correlation between subjects and this becomes a non-issue.


Ninja say what!?!
There is nothing that prohibits you from using the logistic regression. You can still use it. Just remember that the answers you get will probably not be correct.

As for your follow up question, dependence is not the same as multicolinearity. There are many ways of controlling for it. A common method that I see is just throwing the variable into the model as a covariate and possibly an interaction term. Though this might control for it, caution should be used.