When to use validation and derivation cohort?

Hi everyone,

I was wondering if there is any generic rules on when it's good to use a derivation and validation cohort on a dataset? So far I have only seen them being used for diagnostic tests.

Also, if I use two datasets from two different studies to do this, do the datasets need to have the same primary endpoint? E.g. I am planning to do the derivation using one dataset and validation using another. The primary endpoint I am looking at is 30 days mortality for my research. One dataset records 30 days mortality, the other one only records in-hospital death within 30 days (if patients die after they are discharged, it is NOT recorded). Would it be correct to use one as validation and other as derivation cohort for this?

Many thanks.


Less is more. Stay pure. Stay poor.
You typically always want apples to apples. I am sure you could list out all of the limitations of comparing two groups with different follow-up (e.g., do patients have the approximate same length of stay, if not why).

Might be able to compare if most everybody dies with in a day or two while in hospital, that way you would not be missing all of this deaths after discharge.