# Paired sample data analysis

##### New Member
Hey i'm a engineering student from new zealand. In our mathematical modelling paper there is a small statistics section, which i am not too confident in.
Most of our work is using R, except the notes we are given are very vague and i'm getting confused reading them.

Anyway, i have a set of data for men and women who exercise. In this table there is 6 different columns, the 'subject (ie. person) number', 'gender', 'measured weight', 'measured height', 'reported weight' (the weight that the subject thinks they are) and 'reported height' (the height the person thinks they are).

The question im trying to answer is: "Determine whether there is a difference between the measured weight and reported weight of women"
Now, is this paired data? im pretty sure it is.. because it's from the same sample, same conditions, and same amount of subjects?

Thinking that this was the case i continued on, following the vague course book material.
what i did was:
-made a data-vector called 'f.mwt' which is the measured weight of the females in the sample
-made a data-vector called 'f.rwt' which is the reported weight of the females in the sample
-made another data-vector called 'w.diff' which is the difference between the two vectors (by subtraction)
-did a qqnorm plot for normality and fitted it with a straight line, in which case i got this:

i thought that this didn't look exactly normal but close, so i did a shapiro-wilk test and got a p-value of 1.692e-05, which is basically good enough to throw away any assumption of normaility isn't it?

in class, when we found a sample that wasn't normal we did a transformation (are you allowed to do this for paired data?). I thought i might as well try it.
The first thing we did was take a boxcoxplot of the data, to obtain a number off the graph and then do the transformation from there.
But when i try to do a boxcoxplot, using the entry "boxcoxplot(w.diff)" i get the following error;
"Error in var(power.trans(x, p)) : missing observations in cov/cor
NaNs produced in: log(x) "

I have no idea what this error means and i've been searching frantically on the internet to find a solution.
Has anyone here had this error before? Have i done something wrong? Is this even the way that you're meant to solve paired samples?

any help would be appreciated
thanks

Last edited:

##### New Member
Update:
i added 10 to the data vector w.diff, i noticed that you can't do a boxcoxplot on negative values.
after i did the boxcoxplot a value of ~0.8, which i used to do a power-transformation. i did a shapiro test on this data and got a p-value of 1.624e-05, which is still really small

any ideas?

#### JohnM

##### TS Contributor
I think you can take a simpler approach here. In order to determine if there is a difference between the measured and reported weights, just do a paired-sample t-test.

For each person, compute "delta" --> = measured weight - reported weight

The do a t-test to see if the average of the deltas is significantly different from 0.

This link shows how the test is conducted:
http://davidmlane.com/hyperstat/B70211.html

##### New Member
Thanks JohnM, i'd been considering just doing the test like that but when i saw the results of the shapiro-wilk test for normality i thought i needed to do something about it

i'll just take the averages and do a t-test

thanks again

#### JohnM

##### TS Contributor
The t-test and ANOVA are pretty robust to departures from normality.

The other thing to remember is that the inferences drawn from these tests are using sample means, not the underlying populations, and the sample means follow a t distribution when n is small and approach a normal distribution as n gets larger.

This also applies to differences between means.

##### New Member
Is this a common situation?

if i do shapiro-tests on both f.mwt and f.rwt the samples are approximately normal, with p values of 0.1032 for 'f.mwt' and 0.05942 for 'f.rwt'
did this slight variation of normality i'm getting with 'w.diff' occur because of the subtraction? and because the range of values are so small?

I'm probably looking a bit too much into this, i should just do a t-test but this is interesting

#### JohnM

##### TS Contributor
There could be endless reasons why - - don't read too much into slight variations in the p-values - it's basically due to random sampling error.

##### New Member
hmmm, in my work how should i describe the small p-value? as a sampling error? or just leave the shapiro test out completely?

i've got a qqnorm, boxplot, densityplot and histogram which all look normal, im thinking of just putting those in my work and taking my assumption of normality off those.

#### JohnM

##### TS Contributor
What is the sample size here? The Shapiro-Wilk test is mainly for small sample sizes.

Also - for large sample sizes, it's not that difficult to get a sample that visually appears normal, but is "statistically" different from normal - don't sweat it - my previous comments about distributions of sample means apply.