How do you know if you're making a Type I or Type II error?

#1
Type I and II errors seem really theoretical. In real life, how would you ever know if you're making a Type I or Type II error? Can anyone provide any practical examples of where someone doing social science research has found out they've committed one of the two errors?
 

Dason

Ambassador to the humans
#2
Unless you know the truth about the parameters you're testing you won't know if you've made an error.
 
#3
Type I and II errors seem really theoretical. In real life, how would you ever know if you're making a Type I or Type II error?
As Dason said, you can't know whether you've made an error (either type) or the correct decision. The best you can do is to specify the rate at which you'll make Type I errors if the null is true (this equals your alpha) and the rate at which you'll make Type II errors given a specified minimum effect size.

This information is definitely practical. For example, if I do 1000 independent samples t-tests and find 50 significant group differences, I might get all excited and narrow my future research efforts to those 50 differences. But if I realize that 50 "significant differences" is what I'd expect to find even if all 1000 null hypotheses are true and I had an alpha of .05, I'll be wiser and know not to get overly excited about those 50. Conversely, if I find 500 significant differences, I can conclude pretty confidently that many of the null hypotheses were indeed false (thought I still won't know with 100% certainty which ones are or aren't...hypothesis testing never leads to 100% certainty about your decisions regarding the null).

And the practical import of the Type II stuff is in doing power calculations / figuring out what sample size you need for a study.

Can anyone provide any practical examples of where someone doing social science research has found out they've committed one of the two errors?
Let's say you did an experiment, found significant results, and published them. If many other researchers repeat your study and all or nearly all of them fail to replicate your "significant" results, then I think everyone including you would feel pretty confident that your initial results had been a Type I error. (That's the charitable conclusion; others might suspect you of fudging your data).

Here's what I consider a pretty interesting and entertaining example of failure-to-replicate in psychology research:

Frank, Stein and Rosen (1970) carried out an experiment where one group of mice were trained to associate the light side of a test box with shock, another group were stressed by being rolled around in a jar but were not placed in the test box and a third group were untrained controls with experience of the test box. When the brains or livers of these animals were removed, ground up and fed to other animals it was found that recipients who had been fed the brains of trained animals escaped from the light faster than those that had been fed the brains of untrained controls, however, animals fed the brains of donors stressed in the jar escaped faster still. The speediest escapes were made by animals which had been fed the livers of jar-stressed donors. Stein interpreted this as showing that 'transfer' was not memory specific, rather, apparent changes in behaviour or learning-rate could be attributed to stress hormones transferred between donor's and recipients. Eventually [by many attempts at replication throughout the 1960s] it became clear that RNA did play a part in memory, but it did not appear to code specific memories.
http://community.dur.ac.uk/robert.kentridge/bpp2mem1.html

Alright, so that's actually a more nuanced account of that experiment then I was previously familiar with, so it might not end up being the best example of a Type I error...perhaps the livers/brains of the test box-trained animals carried enough stress hormones to cause an actual difference in the recipient animals. But, it could have also been a Type I error.

But there was a recent article in a psychology journal that purported to show experimental evidence for ESP (psychic abilities, specifically being able to predict something before it happens). It had major methodological issues, but could also have involved some Type I errors.
 
#4
Here's another example of a very interesting and weird research outcome (priming people old age-related words causes them to move slower) that couldn't be replicated and therefore might represent a type I error in the original study (or a Type II error in the attempted replication!):

http://blogs.discovermagazine.com/n...on-bargh-psychology-study-doyen/#.U4u8E_ldXUU

A problem in psychology that relates to this is the "file drawer problem": researchers test null hypotheses; if they fail to reject the hypothesis (which could represent either a correct decision or a Type II error), they assume there won't be any interest in the study and they throw it in their file drawer rather than try to publish it; if they DO reject the null hypothesis (which could represent either a correct decision or a Type I error) they assume there will be interest and they try to publish it in journals.

The net effect, according to some, is that the academic research journals are full of both correctly rejected nulls and Type I errors*, while many studies that correctly failed to reject the null hypothesis languish in researchers' file drawers where no one ever sees them, leading to unnecessary duplication of research efforts by other researcher unaware of the existence of the unpublished study.


*Note on something that often confuses people: if most published studies use an alpha of .05, and alpha is the type I error rate, can we then conclude that 5% of rejected null hypotheses in the research journals represent Type I errors? NO! The Type I error rate would only be as high as 5% of published studies if every single null hypothesis tested in any of these published studies was, in fact, false. Hopefully, we can give researchers more credit than that, and assume that they are testing meaningful, theoretically-driven null hypotheses which they have a reason to expect may be false. So, we can assume that the percentage of rejected nulls is at most 5% type I errors, but since researchers do have some reason for selecting the studies the null hypotheses that they choose to test, certainly it is lower than 5%--hopefully much much much lower.
 
Last edited: