There are 2500 workers (let’s use balls to represent them) in the workplace everyday.

Workers fit into one of two categories - (1) those that are exposed to the hazard (red balls), and (2) those that are not exposed (black balls).

The workplace claims that no one is being exposed to the hazard - I.e., they assert that all 2500 balls are black, without looking to know if this is true.

It is not possible to monitor all 2500 workers daily to verify whether or not they are exposed to the hazard (I.e., we cannot look at all 2500 balls).

Because we are dealing with people, who may or may not follow rules the same way from day-to-day, the number of exposures to the hazard (I.e., production of red balls) could change from day-to-day (it is a random number). And, because of various engineered and administrative barriers, the number of red balls (out of the total 2500) that would occur is likely very small (expecting single digit, or very very low double digit)

If there is a sample of red balls amongst the 2500, what function would describe it?

Again, we don’t know if any of the 2500 balls are red; and even if there are red balls, the number of red balls on any given day is random (and we don’t know their number). If instead of a daily sample, we sampled after a period of time (e.g., 30 days), it’s possible the number of red balls would increase, but no guarantee and then likely in a random fashion. As well, on some days, due to type of work, more people are required to work near the hazard. This could run for several weeks, maybe even 4 months, and then the work changes and the number of workers in proximity to the hazard drops. But at all times the hazard is always present. These campaigns could increase the potential for worker exposure, but again, workers following rules might not be exposed. So no guarantee.

My question is the following: “is there a statistical model that can be used to represent this situation?”

Is there a probability function that describes it?

A colleague told me this can be represented by a hypergeometric probability density function. They also said I should group the workers and look at those in close proximity as a different population than the remainder.

Ultimately, I want to make a statement (within some confidence level, 95% seems typical) as to whether or not there are any red balls among the 2500 balls.

I have the ability to sample the balls on a monthly basis.

But, I don’t know how many balls to sample. And I don’t know how long to sample for.... would i stop sampling after I have received a defined number of samples - e.g., 300? Or after a defined period - e.g., 15 samples/month for 20 months? I have ability to sample monthly, but is monthly appropriate?

Can this even be modelled using statistics?