# Choosing the correct statistical test

#### SimonP

Hello,

If I measure a specific variable (likert scale) on Day D, then I measure again the same variable on day D+48hours, and this on 24 subjects, is that correct to compare these two set of data through a McNemar test ? Or should I use a chi2 test ?

My logic would be that I am looking to see the influence that time has on the variable measured on the likert scale....so that would be comparing two qualitative variables. Can i say that "time" is a qualitative variable in this specific example ?

Thanx

#### katxt

McNemar and chi2 tests are for count data. It sounds like you need a paired sample test. The Wilcoxon signed rank test should be fine, or even the paired t test with 24 subjects.

#### Karabiner

if I measure a specific variable (likert scale) on Day D,
Do you really mean Likert scale (an instrument which consists of several Likert-type items),
or just one Likert-type item? In the latter case, maybe this should be treated as an ordinal
scaled measurement, and you use the sign test.

With kind regards

Karabiner

#### SimonP

Hello,

The item I am measuring is supposed to be an ordinal "likert scale" (a scale from 1 to 7 evidencing the pain; where 7 = pain at rest, 6 = pain when walking; 5 = when walking up the stairs, ....).
What would you say in this case ?

I found that "If X and Y are quantitative variables, the sign test can be used to test the hypothesis that the difference between the X and Y has zero median, assuming continuous distributions of the two random variables X and Y, in the situation when we can draw paired samples from X and Y"

Considering a likert-scale is not a quantitative variable, the sign test would not be appropriate, would it ?

Thank you

#### SimonP

A paired sample test or wilcoxon would only work to compare the effect that one variable (qualitative) has on another variable (quantitative). Here, I think both measures are qualitatives. Hence, my question

#### SimonP

Hello,

To me, "Likert scale" is an ordinal qualitative variable....not a quantitative variable.

Let's say you want to follow a training session, and I ask you afterwards how efficient you think the training was, with the following options :
1- not efficient
2- slightly efficient
3- efficient
4- very efficient

This is qualitative, not quantitative. The numbers are just there to reflect the "level of efficiency", but "not efficient" or "very efficient" is quality, not quantity.

You cannot "add" two levels of efficiency, neither make a mean and say that "if 10 people answered "1" and 10 people answered "4" then i can take a mean of "2"".

In my example, saying that "you feel pain at rest (7)" or "you do not feel pain at rest but when walking (6)" is showing the quality of the pain. But the difference of "1" between the level 6 and 7, is not the same difference of "1" between 5 and 6.

Is that making sense ?

#### Karabiner

I must confess that I do not know what you mean. A paired samples test does not do what you describe,
and both measures are not strictly qualitative, but ordered categorical (rank variables).

With kind regards

Karabiner

#### katxt

Having seen your explanation of the scale, I agree with Karabiner. I would go with the Wilcoxon sign test.

#### Karabiner

Not exactly. The Wilcoxon signed rank test requires interval scaled variables,
because the magnitudes of differences are compared. For a true ordinal scale
such as that presented here, I'd say the sign test is suitable.

With kind regards

Karabiner

#### SimonP

Hello,

I am afraid that the studies published in scientific litterature would disagree with your opinion and that this scale definitely has to do with a "Likert scale".
It is even called the "7-Point Likert Scale of Lower Limb Muscle Soreness" and was published in Clinical Journal of Sport Medicine (full reference : Impellizzeri FM, Maffiuletti NA. Convergent Evidence for Construct Validity of a 7-Point Likert Scale of Lower Limb Muscle Soreness: Clinical Journal of Sport Medicine 2007; 17(6): 494–496.)

If this is true, then I would welcome some additional explanation.
If I go back to the basics of how I was taught statistic, then the very first thing to define - in order to choose a test - is :
1) to define the independant (X)/dependant (Y) variable
2) to define if these variables are qualitative/quantitative

- If X and Y are both quantitative, then a Spearman/Pearson correlation is used
- If X is quantitative and Y qualitative (or vice-versa) then there are several options (depending on the normal/non-normal distribution; the fact that the groups are paired/not-paired; the fact that the variances are equal/not equal, the fact that there are 2 groups or more than 2 groups, ...)
- If X and Y are both qualitative, then depending on the case it would either be a Fisher test, Mc nemar test, Chi2 test or Kappa test.

Do you agree with this ?

A paired sample test, such as the "student t test" or the "Wilcoxon test" is a test allowing to compare two set of data when the independant variable is qualitative and the dependent variable is quantitative. Let's take the following example to explain my point :

If I have a group of 60 people, that I then :
- measure their heart rat
- split the groupe in two (A and B). A being the test group and B being the placebo group
- give the pill to group A and nothing to group B
- measure again the heart rate of all 60 people the next day

I can perform a statistical test to evidence "the effect that this pill has had on the heart rate" and therefore see :
- if "there is a statistical difference before/after taking the pill for group A" (and see if there is a stastistical difference between day1 and day 2 for group B) as well as
- if "there is a statistical difference between group A and group B on the next day".

In this case, either the people will/will not take the pill (the independant variable is qualitative) and the measured variable (dependant variable) is the heart rate (quantitative).

I will therefore here have measured the effect that the independant variable has on the dependant variable (i.e., I will have measured the effect that the pill has on the heart rate), which means that :
- if I make an intragroup comparison I will have to use a "paired t test" (or wilcoxon, for non-normal distribution); and
- if I make an intergroup comparison, I will have to use a "welch test" or "t student equal variance" test (or "mann-whitney test" for non-normal distribution

The reason these tests are chosen is very much linked to the type of variable (quantitative or qualitative).
If, for example, I wished to see the link between two quantitative variable (e.g., the link between the weight of a mouse and the length of its tail), then I would need to perform a Pearson (or Spearman) correlation.
And if I wished to see the link between two qualitative variable (e.g., the link between taking/not taking a pill and being sick/healthy), then I would most likely need to use a Chi2 test.

There is nonetheless a HUGE difference between the example mentioned above and my original example.
In the original example, the 7-point scale (called "likert scale" by those who developed it) is a qualitative scale (not quantitative).
It therefore means here that the measured variable is qualitative, not quantitative.

I hope this explains better my viewpoint

#### Karabiner

It is even called the "7-Point Likert Scale of Lower Limb Muscle Soreness" and was published in Clinical Journal of Sport Medicine (full reference : Impellizzeri FM, Maffiuletti NA. Convergent Evidence for Construct Validity of a 7-Point Likert Scale of Lower Limb Muscle Soreness: Clinical Journal of Sport Medicine 2007; 17(6): 494–496.)
Well, a single item cannot be a Likert scale (although many people confuse the response scale of the item with
the complete scale), and calling the answering format of that item a "Likert" format seems quite strange, given the
clear and simple definition of the Likert item. But anyway, the main problem now is not its labeling, but
which scale level it has and how it was used in the present study.
There is nonetheless a HUGE difference between the example mentioned above and my original example.
In the original example, the 7-point scale (called "likert scale" by those who developed it) is a qualitative scale (not quantitative).
It therefore means here that the measured variable is qualitative, not quantitative.
The huge difference is that you do not have 2 groups, therefore the example is not helpful.

If I understand you correctly, you want to find out whether there is a change with regard
to the 7-point measure between first and second assessment. So you can opt for using
the measurement as ordinal scaled (which makes very much sense in my opinion, since
I cannot see how the difference between for example "pain at rest" and "pain when walking"
can be proven to be exactly the same as the difference between "pain when walking" and
"pain when walking up the stairs") and use the sign test. Or, if your sources say otherwise,
and your reviewer accepts that, then you could treat the variable as interval scaled, and
use dependent samples t-test or the signed rank test.

With kind regards

Karabiner

#### katxt

Or a permutation test perhaps? kat