Can i apply a proportion test between two samples from the same database?

Cerm

New Member
#1
Hi,

I am going to first provide a background on the scenario as my thinking might be flawed from the outset:
I am managing a database where forms come in and if it's missing a piece of information (address for example) a clever business algorithm runs and fills in the gap. It was tested a couple of years ago and derives the correct value (let's say) 85% of the time.

I want to re-run the same test and see if the algorithm is performing as advertised. So through a lot of reading I am making the assumption that:

I can apply some kind of proportion test such as a Z- test
AND
The sample I take and check now is independent from the one done a couple of years ago.

Can I do this? Something feels off because it's not really two distinct groups. I suppose I wanted to make a conclusion such as:

"After sampling 400 address derivations, the accuracy was 67%. By using a Z-test for proportion, it can be shown that the performance is significantly different."

Thanks for the assistance,
 

Karabiner

TS Contributor
#2
It is not clear whether you want to compare two samples (old test sample vs. new test sample),
or if you just want to know wethter the proportion in the new sample is statistcally signifcantly
different from a reference value (i.e. 85%). If you do the latter, you can perform a binomial
test with expected proportion 0.85.

With kind regards

Karabiner
 

Cerm

New Member
#3
So I want to test the old sample versus the new sample and see whether there is a significant difference between the results. (Maybe the algorithm has degraded or perhaps the information it retrieves has improved). Does it meet the condition of independence if I make sure the forms I test now don't have any overlap with the forms tested a couple of years ago? So I can go on to use a Z-test for two proportions or something similar?
Thanks for the help.
 

Karabiner

TS Contributor
#4
You had n= ? forms in the test a couple of years ago, and you have n=400 forms now.
If they do not overlap, you can do a test for independent samples, e.g. Chi² test of
association for a 2x2 table (variables "sample" old/new, and "result" correct/incorrect).


With kind regards

Karabiner
 
Last edited:

Cerm

New Member
#5
Great! Yes the Chi-squared was the other one that made sense. Glad I'm on the right track. And yes, first test had n=100 so a bit of a discrepancy buy hopefully large enough sample to play around a bit.
Thanks for the help Karabiner.