Random tests of data - probability

#1
So Hi, I don't do much stats, I'm mostly in software development.

I've got a bunch of documents which should have been saved to individual accounts, but there are some seemingly random (human) errors, so that the wrong account gets a document, I've got these in 1,000s of folders, with the documents in them, and some people to look in those folders and make sure the documents are the right ones by reading them.

We've tested a few and found some errors, I'm sure there's an equation that would allow me to know how many folders I'd need them to test to be able to estimate an average number of folders with errors, to help us know should we accept the risk (if it's a low proportion),

I've got about 8,000 folders, so if I could check 50 and say the error percentage was 1 in 50, how accurate would that likely be? and how much more certainty would testing 100, 200 folders etc give me?

I'm sure there is standard analysis equation I can use, but trying to research any kind of stats, after not doing any for about 40 years is overwhelming, I can't see to find what I need without knowing the name of it first.

Also, am I right to be checking folders? would checking documents at random be a better approach (there are various numbers of documents per account, with a mean of 8).

I really just need pointing in the right direction, resources, calculators, or literally, just the name of the test I might use.

Thanks in advance.
Chris
 
#2
There is a difference between finding the proportion of folders with misplaced files, and the proportion of files that are in the wrong folders. Your outline indicates the first, but for various reasons probably your later suggestion is what you really want.
The technique you are looking for is a confidence interval for a proportion. Here is a calculator http://www.sample-size.net/confidence-interval-proportion/
Choose say 100 files and random. If 3 are misplaced then the calculator shows that the proportion of misplaced file in your sample = 0.03 or 3%. The true proportion is unknown but we can be reasonably sure (95% sure) that the true proportion of misplaced files is somewhere between 0.006 or 0.6% and 0.085 or 8.5%.
The stats is now over. It is up to you and management to decide how many you files you test - it depends on how certain you want to be. You can experiment with the calculator until you find the sample size you need to give you the precision you are happy with. kat