# Determine if a worker differs from typical performance?

#### jeremy_y

##### New Member
I'll start with some details about my specific situation. I've got a database that contains records of jobs, clients, and workers. My goal is to see if a given worker has a lower than average score when it comes to converting first-time jobs into recurring jobs.

I do this by taking all first-time jobs that a given worker has been assigned to, and I then check to see if any subsequent jobs exist for the same client. If there were subsequent jobs the worker gets a 1, otherwise they get a 0. This gives me something that looks like this:

Worker 1 [48 total records (first-time jobs)]
0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1,
--
Worker 2 [56 total records (first-time jobs)]
1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1,​

I then take this data and calculate the mean. I do this by counting the total number of first-time jobs in my system (2,925), as well as the sum of the 1's and 0's. This gives me a mean of 0.38 (so 38% of all first-time jobs typically become recurring jobs). I then calculate the standard deviation, which in this case is 0.13.

I then look at each worker who has completed a minimum of 30 first-time jobs. This is so that I only analyze workers with a sufficiently large sample size, in order to increase confidence in the results.

I then come up with a mean score for each of these workers (the sum of the 1's and 0's, divided by the total number of first-time jobs). Finally, I convert this into a z score, which I do by subtracting the mean (for all the data) from the worker's mean score. I then divide the result by the standard deviation to obtain the individual z score. Finally, I put the results in a bar chart, which looks like this:

The dark bars are for any result with a z-score above 1 or below -1.

That's pretty much it. I guess my questions are as follows:

1. Is this the correct approach, given my goals?
2. Does it make sense to limit this analysis to workers with a minimum of 30 records? I know that generally the larger sample size the better, but would it work to apply this analysis to workers with only 20 records, for example?
3. Finally, what z-scores should I consider significant? Right now I am focusing on anything greater than 1, but is that too low?

I greatly appreciate any input.

#### jeremy_y

##### New Member
Thanks for the response.

The reason I reduce the numbers to 0's and 1's is because the number of total recurring jobs for a given client doesn't matter in this case. I'm just interested at looking at whether the first-time job a worker was assigned to ended up recurring (or not), so it's really a binary issue.

I'm pretty new to z scores and statistics in general, but from my understanding the z score is used to show how many standard deviations a data point is above or below the mean. It's calculated using the following: z = (x - μ) / σ

The variables in the z-score formula are:
z = z-score
x = raw score or observation to be standardized
μ = mean of the population
σ = standard deviation of the population

#### victorxstc

##### Pirate
Actually yesterday I tried to reply but thought I haven't understood your question. But I can give you some suggestions, based on my understanding of your question.

First I think it is not a very good idea to calculate mean and SD of numbers consisting only of zeros and 1s. Such a sample will give you a very small standard deviation, so I think you should use proportional approaches here.

The second point was I agree that it is a good idea to include workers who have had a more number of first-time jobs. However, it is also advantageous to include a higher number of workers. Each of these methods contributes to a more reliable variation. So I think a more number of workers with a little less work xperience might be better than only 7 workers.

I hope I can give you more suggestions, but maybe we should discuss your case more for me to understand your case.

Also please clarify the use of z-score as i'm not so familiar with it and its usage.

+++++++++++++++++

Then if the z score deals with standard deviation, I think the result is not again so useful because a standard deviation calculated from some zeros and ones would be something very (unrealistically) small.

Besides, thanks for the explanation of the z-score. Then it is reflective of the magnitude of the mean value in terms of standard deviation. Since it is again dependent on standard deviation, I think we should double-check its correctness.

Finally, what z-scores should I consider significant? Right now I am focusing on anything greater than 1, but is that too low?
You can consider a z score significant if it is greater than 1.96 or less than -1.96. (but please first make sure your standard deviations are not problematic).

Edited:

As far as I remember, this z score and its 1.96 significance level was valid for "t" distribution. So I think you might have some difficulties to use a totally non-normally distributed sample of zeros and ones to calculate z-score and base your conclusion on this z-score.

Last edited: