Sample Size Methodology

#1
Hello all,

I am calibrating a name matching algorithm that matches misspelled names to correctly spelled names. As an example, "Jhon Deo" to "John Doe". I am trying lots of different name variations: switched letters, abbreviations, rearrangement of words, vowel swap, letters to numbers, etc.

I am wondering how large of a population size (the number of names) I should use to conduct this testing. There is an important list of suspicious names maintained by a governing body that is ~6,000 entries long. Using a 95/50/5 sampling methodology, that put me at 362 names. That is a large population given my technological limitations.

To be clear, I am currently sending all 362 names through each type of name variation (abbreviations, rearranged names, etc), which is resulting in 15,000 different tests.

Is there a different sampling approach / test population methodology that would be be suitable to this exercise? Any information, or simply links to additional reading, would be greatly appreciated! Thank you!