Mistake in a textbook: What proportion of studies make "wrong" conclusions?

#1
I found the following statement in an undergraduate psychology textbook:

Usually researchers test their hypotheses at the .05 (or 5%) significance level. If the test is significant at this level, it means that researchers are 95% confident that the results from their studies indicate a real difference and not just a random fluke. Thus, only 5% of research conclusions should be 'flukes.'
The way I would read those sentences, they are clearly incorrect. Look at the last sentence:

only 5% of research conclusions should be 'flukes.'
This sentence refers to 5% "of research conclusions." That sounds to me like mistaken research conclusions of all types -- Type I and Type II errors combined. (I guess you could argue that "flukes" specifically refers to Type I errors, but to think that an undergraduate would read it that way is really a stretch).

Since I believe any average undergraduate would read this sentence to mean "95% of all studies get their conclusion correct and 5% get it wrong," then by that reading, the 5% number is clearly inaccurate.

In principle, I think you could calculate the proportion of all research studies conducted that come to "mistaken" conclusions about H0, if you did the following:

Let alpha = type I error rate
Let beta = type II error rate
Let p0 = proportion of all studies for which H0 is true
Let p1 = proportion of all studies for which H0 is false


Then the proportion of research conclusions which are incorrect could be calculated as:

p0*alpha + p1*beta

Whereas, the authors of this book seem to simply be equating alpha itself with the percentage of mistaken research conclusions, which would only be true if EVERY study EVER conducted had a true H0 (i.e. a useless treatment). Let's hope and pray that is not true.

Obviously, to actually calculate the value of p0*alpha + p1*beta requires all kinds of things - knowing p0 and p1 (which we never would) and knowing beta with perfect accuracy (which would be contingent on a flawless power analysis with a perfectly accurate effect size estimate - unrealistic).


Lastly, is there any way this sentence can be saved? To me it seems wrong in so many ways that it can't be salvaged:

If the test is significant at this level, it means that researchers are 95% confident that the results from their studies indicate a real difference and not just a random fluke.
I am thinking of emailing the author of this textbook, so I'm wondering:
  1. Do you agree with the way I am reading these sentences (through an undergraduate's eyes)?
  2. Do you agree with my math that the "proportion of mistaken studies" could in principle be calculated as p0*alpha + p1*beta? And that the author is simply substituting alpha itself for this value?
  3. Do you have any suggestions how this could be worded better, taking into account that the undergraduates reading it:
    • may not have had a statistics class
    • may not be familiar with Type I and Type II errors
    • may only think in binary terms of "correct/incorrect" conclusions, rather than in terms of conditional probabilities like "correct/incorrect IF H0 TRUE" and "correct/incorrect IF H0 FALSE"
 
Last edited:

Karabiner

TS Contributor
#2
I agree with 1) and 2). Regarding 3) I have no idea.

The frequentist paradigm, which is still the standard in teaching and practising statistics,
will be replaced by the Bayesian paradigm in the not too distant future, I suppose, and
the often useless yes/no thinking in terms of absolutely false/correct hypotheses will
mostly disappear.

With kind regards

Karabiner
 

ondansetron

TS Contributor
#3
I agree with 1) and 2). Regarding 3) I have no idea.

The frequentist paradigm, which is still the standard in teaching and practising statistics,
will be replaced by the Bayesian paradigm in the not too distant future, I suppose, and
the often useless yes/no thinking in terms of absolutely false/correct hypotheses will
mostly disappear.

With kind regards

Karabiner
I only skimmed the OP and so won't comment directly on that, but rather on part of the idea that Bayesian statistical theory will replace Frequentist. I disagree with this because it implies the Frequentist theory is useless, which it clearly is not. I think you will still have universities that still teach primarily from Frequentist or Bayesian perspectives, but I think more programs will present both philosophies earlier and in a relatively more integrated fashion; similarities and differences will be highlighted and special cases of equivalence will be taught. As a factual matter, in an actualized decision, someone is either correct or incorrect; the element of unknown truth doesn't change that, but the Bayesian perspective on probability helps bridge the gap in summarizing evidence that supports this after-the-decision uncertainty (in my mind).

To say that Bayesian methods are generally good and Frequentist generally bad is a misstatement (which I know you didn't explicitly make, so this isn't directed at you, necessarily) on a large scale. Each has been useful in solving important problems throughout history and they need to be recognized as complimentary approaches that help triangulate an answer. Each philosophy has approaches that are useless in various scenarios.
 

hlsmith

Not a robit
#4
Well to continue the deviation, many of the newer "machine learning" / "Deep Learning" approaches haven't integrated Bayesian priors. I would say a big change will be the use of these latter algorithms more often. And these newer algorithms haven't fully embraced Bayesian themes. The tipping point may be evolving/changing. Ten years ago I would say you are right Karabiner - but I may question such an accelerated time line.