# Simple regression between 2 variables using an average

#### batman1

##### New Member
I have a data sheet with 2 variables. I'm measuring whether doing a certain task at work increases fatigue of the worker. The sampling was done over 28 days, 2 shifts per day. The workers may not do this continuously, e.g they might use it for 3 out of 4 hours one shift and 4 out of 4 hours another shift etc.

I have a subjective fatigue score (difference before and after) and the number of hours doing the task. If I do the basic linear regression and plot it I get r2 of 0.32, y 0.58x + 2.18. This is for the morning shift. Similar results for the evening shift.

However, if I change the number of hours doing the task to % of the total shift time, and get the average fatigue score for each range of percentages, the r2 value changes up to 0.98.

The number of points has reduced to 3 ( 50%, 75% 100% with 3 averages on the fatigue score)

I'm new to statistics and trying to get my head around it.

I hoping to get some opinion on whether the first method, using all the 'raw' data, or changing to the percentages and less points, is the best, or whether one is not acceptable.