# Using a correlation as a measure of how well heuristic predicts outcome?

#### kutlak.roman

##### New Member
Hello,
I am preparing an experiment in which I would like to test a heuristic. This heuristic predicts how well certain facts are known. The values returned by the heuristic are between 0 and 1.

I was thinking of testing this heuristic by asking people if they know a particular fact and compare the percentage of people that know it with the output of the heuristic. Participants can answer True, False, Don't Know.

When my heuristic predicts that a fact is more likely to be known (value closer to 1) higher percentage of people will say that the fact is True.

Example to clarify this:
F1: Isaac Newton was a physicist.
F2: Isaac Newton was a warden of the Royal Mint

Heuristic(F1) = 0.9
Heuristic(F2) = 0.1

Presenting the above facts to 10 people will yield the following results:
_________________________
|Fact | True | False | Don't Know|
|-----------------------------------|
| F1 | 8 | 0 | 2 |
| F2 | 1 | 2 | 7 |
-----------------------------------

Looking only at the True responses indicates that 80% people know F1 and 10% people know F2.

To measure how 'good' my heuristic is, I will look at Spearman correlation between the scores of the heuristic and the percentages of people who knew the fact.

I would like to ask if this seems as plausible way of evaluating it and if I didn't miss any obvious errors from statistical point of view. I looked for similar threads in this forum but I didn't have much luck at that.

Thanks,
Roman Kutlak

#### d21e7x11

##### New Member
Hi Roman,

It seems to me the logic is wrong here:

For "F2: Isaac Newton was a warden of the Royal Mint" 10% of people said it's true so it means 10% of people don't know F2 wherereas you concluded that "...Looking only at the True responses indicates that ... 10% people know F2."

#### kutlak.roman

##### New Member
Hi d21e7x11,

Thanks for your response. I am not sure I understand why you concluded that 10% of people don't know F2. The point I was trying to make is that both F1 and F2 are true ("In 1696,Newton was appointed warden of the Royal Mint,..." BBC - History section). While most people know F1 rarely anyone knows F2 and this should be reflected in the percentage of people who agree with the statement. Any further comments are welcome.

Regards,

Roman

#### Karabiner

##### TS Contributor
You mean, you have a set of questions F1, F2....Fn, and then you calculate the
correlation between predicted % and actual % of correct answers? One problem
with correlations is that they represent shape, not distance. i.e. the pairs
predicted/actual:
10% / 40%
15% / 45%
30% / 60%
50% / 80%
produce a perfect correlation, but obviously the prediction is not very accurate.

Regards

K.

#### d21e7x11

##### New Member
Hi Roman. Yes, you're right, I misunderstood that point.

#### kutlak.roman

##### New Member
Hello Karabiner, that is an excellent point. It didn't even cross my mind; I guess I'll have to think about that one. Thanks.

#### kutlak.roman

##### New Member
Hi. I guess the way to minimise over- or under-valuing the heuristic is to test it on enough data. The numbers I get from the heuristic don't necessarily represent percentages. They can correspond to probabilities or other measures. What I want from my heuristic is to tell me which facts are more likely to be known. Telling me how many people will know it would be a bonus but it is not needed for the application. After thinking it through I decided that correlation is probably the best I can do.