Determine probabilty of a sampled value being <= 40

Hi all! Brand new poster here, trying to think back to those stats classes that I hated at the time, but now seem fascinating. Ahhhh to be 18 again :)

Thank you for reading. I did see the "which test should I use?" post, but it's six years old and most of the links are broken :yup:

I have a business problem. I need to know if the drive time from one lat long to another is less than 40 minutes. I have a computer script which calls a mapping service for drive times at a regular basis and compiles many hundreds of samples. So my data set is a list of many hundreds of time values:

...and on...

My business has determined that a drive time longer than 40 minutes will make our driver late. We need to know the percentage chance of being late.

How can I determine the percentage chance that the next sampled drive time will be <= 40?

Please note I specifically did not say "the drive time itself will be <= 40," as we understand the mapping service is a proxy.

Also, I've just finished a TON of reading about frequentist confidence intervals vs. Bayesian credible intervals. The later has a definition more applicable to my case, but since I'm relying purely on sampled data, I'm willing to accept the former.

So this is basically a "which test should I use" question. Thanks so much for any thoughts!


TS Contributor
first a formal remark - a test will answer a question about your data, something like "is the average drive time less then 40 min?". You question is a bit different. I could see several approaches :

1. determine the distribution of your data and use this to calculate the probability of being over 40 min. E.g. if your data is normal with an average of 35 and std dev of 5 then your chance of being above 40 is roughly 15%. In this approach you would need to figure out the type of the distribution and the parameters.

2. Just label any data point that is below 40 as PASSED and any data above as FAILED then calculate the percentage failed and the confidence interval based on the Bernuolli distribution.

In both cases the underlying assumption is that drive times are independent random variables. I would definitely look at patterns first - does the day of the week, or the time of day influence the drive times? Is there any link between drive times, like after a late drive you might have a higher (lower) probability of being late again? this sort of thing.

Good luck
Yes, thanks. I realized last night that a lot of the tests answer questions about the mean. I don't really care about the mean, but just the chance of a sample being <= 40 minutes.

And we do break them up into weekday / weekend and rush hours / non-rush hours.
Currently we just label them pass/fail and find the proportion that passed. I thought we might spruce this up with some hardcore statistics.

I like your two suggestions and I'll look into them a little more...thank you.