How to correctly interpret the results of multiple papers that use subsets of one large data set.

Hello all,

I'm researching the effect of cognitive impairment on the vocational outcomes of people who have experienced their first episode of psychosis (FEP). Cognitive impairment is typically measured using a battery of cognitive tests. Results from these batteries may be standardised and averaged out into a overall measure of cognitive functioning or analysed into a number of factors/dimensions of cognitive functioning. Vocational outcomes may be measured by a simple 'employed y/n' question, hours of employment over the past x amount of months, or complexity of vocational role.

In this area of research, I've come across several papers published by the same group of researchers. They collected a lot of data from a large sample of people who had experienced their FEP. Each paper is considering a different research questions. As such, each paper uses a different subset of the larger data set. I'm trying to pull the findings from these studies together, along with many other studies in the area, to establish what we know so far about the effect of cognitive impairment on vocational outcomes in FEP. However, since the results of these papers rely on the same larger pool of data, I'm not sure if it's OK to treat the papers as if they're separate studies. That is, is it OK to accept the results of each of these papers as significant at the p<0.05 level or should there be some adjustment made, given they all rely on the same larger sample of data?

Any help you can provide is much appreciated! Let me know if my question is unclear.

Thanks so much!!
Last edited:


Not a robit
Yes, the exploration into multiple hypotheses within a single study sample can be at risk for false discovery (type II errors). If the authors did not make any corrections, you may want to consider it.

Secondarily, I was unsure if you were referencing this as well, but you wouldn't want to pool results on the same metric in from the same general population, in that those results are not independent.
Thanks for your reply hlsmith. It's really helpful. The authors did not make any corrections, so I will definitely consider that.

Could you please explain to me what 'pool results on the same metric' means? I'm working on a literature review for my thesis at the moment. I'm explaining the findings from these papers, and eventually would like to say 'given all the findings from these papers, this is what we know so far about the effect of cognition on the vocational outcomes of people with FEP'. I'm not actually manipulating the data from these studies or running my own statistical analysis on them. I would just like to draw a correct inference from their (collective) findings.