# How to compare a subset to a set - timing

#### chris.overend

##### New Member
I have a task (PARENT) that completes thousands of tasks every day all of varying length. That task takes hundreds of minutes to complete, I now have 300+ data points after a year. I have a task (CHILD) that runs a varying subset (150, 2000, 1243, ...) of the thousand tasks.

Changes are made to the CHILD tasks. To determine if these changes are acceptable I need to compare the CHILDS times against the PARENTS times to determine
if the CHILD is completing in an acceptable time frame. The acceptable time frame should be determined by PARENT data. The PARENT does have some out lying
data points.

I have never done any statistical analysis before.
My thought is that if I know that 98% ( to exclude outliers ) of the PARENT longest and shortest times are with in 4% of the (mean , median) then I can compare the CHILD task time against the same PARENT subset, and see if it is inside or outside of the 4%.

example:
PARENT
[209.064, 212.289, 199.694, 203.198, 206.779, 210.909, 207.152, 194.514, 194.23, 198.452] - time to complete 6000 tasks
CHILD
[60] time to complete 150 - if I compare against the same tasks for PARENT for the last 5 times it was completed PARENT could be [54, 60, 50, 55, 46]

This smaller set of data does not seem to provide enough information to determine if the tasks are slower or faster as compared to the PARENT.

It would be nice to be able to show

RUN | % slower vs PARENT
------------------------------
1 | +4%
2 | +2%
3 | +5%
4 | -1%