when does multiple comparisons become a consideration?

Typically in research you collect data to test a specific a priori hypothesis. But nowadays many people generate hypotheses and then access appropriate datasets in online databases that have already been sampled and stored for general use.

Is it valid to keep generating & testing new hypotheses on the same dataset even though the data weren't specifically collected to test those hypotheses, or does this produce a multiple comparisons problem? I think think it's perfectly valid as long as the data can reasonably be used to test the hypotheses, but I disagreed with a friend on this issue who suggested that this contributes to the replicability crisis (I suppose due to inflated Type II error and thus increased false positives). It got me wondering about when exactly the multiple comparisons/tests issue becomes a consideration and I realised I don't actually fully understand it.


Less is more. Stay pure. Stay poor.
Yup, this is a big issue. Anytime you make more than one test you should correct even if you are comparing three groups to start with corrections are needed right then. It really is just that simple. If you are using an existing set you should set your level of significance low and make no real conclusions beyond saying additional studies are needed.

An equally large issue is that the dataset wasnt designed for the new question. Thus the hypothesis may be under powered and not all confounding variable may have been collected and there could be some type of selection bias in the sample.r

These are contributors to the crisis along with investigators trying to hard to find significance via deviating from a priori protocols.