Laser dinghy: what is the probability that sail numbers share the same final four digits?

General question: given a series of n random numbers, of 6 digits each, what is the probability that two of those numbers have the same final 4 digits (in the same order)?

This question is from a real case scenario, taken from the Laser dinghy races.

Laser sailboats (now ILCA) have a 6 digits hull number, which is replicated onto their sails (so-called: sail number).
The final four digits are of black color, while the initial two digits are either red or blue. Here is an example:


ILCA/Laser races are usually very crowded. It is not uncommon to have more than 200 boats racing.
In most cases, there are two or more Race Officials (ROs) who take note of the finish order. (GPS trackers are not accurate enough... yet.)
Taking note of such a large quantity of 6 digits numbers is almost impossible, especially in severe weather conditions.
Therefore, usually, ROs take note of the final four digits only (i.e.: of the black numbers only).
But sometimes their duty is complicated by the fact that two different boats, with different sail numbers, share the same identical final four digits of their respective sail number. (Example: 170695 and 210695, both are 0695.)

I was wondering, what is the probability that in a fleet of 200 boats, two of them have the same final four digits?
And what is the probability that three of them have the same final four digits?
Is it relevant that in present days the sail numbers are concentrated within the range from 195000 to 221000, with a peak of boats with sail numbers between 213000 and 218000?
Somebody told me: "you shouldn't care... after all the probability is only 1 over 10000".
It looks to me like a gross mistake.
Isn't this problem similar to the "Birthday Paradox"? If yes, how can we translate the relevant formulas of that paradox to this specific case?

Thank you indeed.


Less is more. Stay pure. Stay poor.
There is no paradox, it is just a basic probability problem. Much like a deck of cards, what is the probability of drawing a club, 1/4, since 13/52 clubs. Now we can ask any question given the deck is complete. Your deck may not be complete if not every number in the range is accounted for. So you can use the same rules, but they will only be approximates and require some assumptions. The key assumption will be there is not a systematic cause of missingness that would result in two similar numbers to be present or not. This seems reasonable enough. So given this do you want to try and answer your question and post your solution - given this information.

However, you have a two sampling frame. You randomly sample 200 number from the range (~30,000), then randomly sample infinitely many times and see how often the two numbers have the same last two numbers. Another way to do this may be to put a bigger weight on the range you listed. However, if you only care about the first two digits and the overall range is pretty large, none of this should really matter even if the last number does not end with two zeros "00". So I think things may default back to something very close to the top scenario except you don't care about the suit just whether they are the same. So the first number doesn't matter only that the second is the same, so 1/100.

@Dason - guide us here.


Ambassador to the humans
Certainly does sound birthday paradox like. Shouldn't be hard to modify the birthday paradox formula for this type of situation. Although you did give more information so we also could just do a straight simulation and compare results.


Ambassador to the humans
If we treat this as a normal normal birthday paradox situation you could use any sort of online calculator. Here's one:

You would set the "number of days" to 10000 and the group size to 200 or whatever size race you have.

I do think this isn't quite right though based on the other info you provided. A simulation of some sort might be in order.


Ambassador to the humans
Completely unrelated to the probability calculation but if they have trouble with the amount of digits to record they might want to increase the alphabet size.