# How can I compare correlation between two data sets

##### New Member
Hi all,

Would be very grateful for any help people can offer re: below problem.

I'm doing a study comparing actual vs predicted outcome in a cohort of patients.

I essentially have two data sets that I would like to compare from the same cohort. All of these patients underwent a medical procedure (for ease's sake lets say removal of their appendix). I have the data confirming their actual outcome (in terms of months survival). There is also a grading/prognostic score that can be used to calculate their predicted outcome based upon a number of factors. I therefore have data for the cohort with each individual's actual vs predicted survival in months.

I've been trying to find a way to find if these two sets of data correlate with each other. In practical terms, how well do actual and predicted survival correlate? If the pre-procedure score suggests they will have a poor outcome (low months survival), is this seen in their actual survival?

I've heard mention about doing correlation coefficient or R-squared test to assess this. Are these the most appropriate tests? If not, why not and is there a better test I could use instead?

Thank you!

#### Karabiner

##### TS Contributor
How many patients were included, and have all patients died within the observation period?

With kind regards

Karabiner

##### New Member
Around 200, and not all patients have passed away (but will still have a predicted survival outcome).

#### Karabiner

##### TS Contributor
Assuming that the observation period is the same for all patients,
then there is a category "survived beyond maximum observation
time" and accordingly, "predicted survival time is beyond maximum
observation time". You can include these cases if you rank order each
of your variables, and assign to them the highest rank. You can
perform Spearman rank correlation, which is for ordinal data. It tells
you the degree of montonous relationship between predicted and
observed.

With kind regards

Karabiner

##### New Member
The observation period will not be the same for every patient as they were collected over a few years - would the above still be appropriate?

#### Karabiner

##### TS Contributor
I am inclined to assume that this would still work, but you will have difficulties
to preent the results if no refernce exists for this analysis (and I do not have
any). The main problem would be number of survivals. How large is that
proportion? And maybe you should exclude all patients with very low
possible observation persiods.

With kind regards

Karabiner