# What is bias?

#### noetsi

##### Fortran must die
This confuses me. It is an answer about bias on a board (one that I have heard before). to me this is an incomplete definition of bias. Note the original question deals with heterogeneity of results.

"Bias is the expected difference between the population parameter and the value (statistic) used to estimate it. When the value is the parameter itself, that expectation is zero. (In fact, the difference is zero, not just the expectation.) If you do have the whole population, though, testing and other forms of inference do not make sense. The point of inference is to try to guess intelligently about an unknown population."

But to me that seems an incorrect definition of what bias is (or maybe I think about it wrong).

My response (and confusion)
"For instance, if a and b drive the dependent variable, and a and b are related and you leave b out of the model than the relationship between a and the dependent variable would be wrong in your model even with the whole population. To me that is bias, but maybe not. Or if you use a linear relationship for non-linear relationships."

#### hlsmith

##### Less is more. Stay pure. Stay poor.
You are both right. The top description is incomplete in regards to study design or identifiability issues (e.g., correct model specifications and available covariates).

#### noetsi

##### Fortran must die
I know they are technically right. But it is certainly incomplete.

#### hlsmith

##### Less is more. Stay pure. Stay poor.
I guess a way I think about bias is that if you increased the sample size up toward the population size you never get the right estimate because something is wrong some where!

#### noetsi

##### Fortran must die
Well in my case I have the population so that is not a good way to think of bias I think there is method bias as well as the standard definition you raise.

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Agreed, my description neglected to just say truth. So regardless of if you have a sample or population, the estimate is not equal to the truth. regardless of what you are dealing with and chance.

#### noetsi

##### Fortran must die
I have no illusion at getting at truth in my models But I understand what you mean.

#### fed2

##### Active Member
I guess a way I think about bias is that if you increased the sample size up toward the population size you never get the right estimate because something is wrong some where!
Sounds more like consistency, ie asymptotically biased

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Sounds more like consistency, ie asymptotically biased
Correct, that statement did come off that way!

#### noetsi

##### Fortran must die
Well there are, to me, two forms of bias. One is having a sample rather than a population. That is the form of bias that tends to come up. But I think there is also bias due to methods. So even with the whole population you would generate the wrong slope, if for example you use a linear model and your relationship is non-linear.

#### fed2

##### Active Member
i don't think lines exist in nature. neither do normal curves.

#### fed2

##### Active Member
thats true. they may just be very small.

Its also possible they are so big we do not see them, for example we may be inside the line right now.

#### hlsmith

##### Less is more. Stay pure. Stay poor.
I will say for normal curves genetics play an awesome role with some genes getting turned on and others not - so something like height is a composite of a bunch of binary processes - making height an emerging normal variable.

I would also say that lines or dose responses can exist within some thresholds values (ranges), but mainly need to be transformed to become 'linear' in shape.

#### noetsi

##### Fortran must die
I am reminded of the joke that the only time you find normal data is when it is made up. That data is rarely if ever normal in the real world. It is hard for me to believe that data with real events would follow such a distribution.

#### fed2

##### Active Member
I would also say that lines or dose responses can exist within some thresholds values (ranges), but mainly need to be transformed to become 'linear' in shape.
Thats a good point because probits can be though of as lines and normal curves.

I will say for normal curves genetics play an awesome role with some genes getting turned on and others not - so something like height is a composite of a bunch of binary processes - making height an emerging normal variable.
I think it depends on childhood nutrition. people with bad nutrition never fully recover. saw it in a book.

I am reminded of the joke that the only time you find normal data is when it is made up.
you shouldn't make up data, that's fraud. you can pretend it is normal though, that's just another day at the office.

#### noetsi

##### Fortran must die
Its not fraud when you are simulating data for classes. But mainly its a joke.