# Choosing the correct statistical test

#### SimonP

##### New Member
Hello,

If I measure a specific variable (likert scale) on Day D, then I measure again the same variable on day D+48hours, and this on 24 subjects, is that correct to compare these two set of data through a McNemar test ? Or should I use a chi2 test ?

My logic would be that I am looking to see the influence that time has on the variable measured on the likert scale....so that would be comparing two qualitative variables. Can i say that "time" is a qualitative variable in this specific example ?

Thanx

#### katxt

##### Well-Known Member
McNemar and chi2 tests are for count data. It sounds like you need a paired sample test. The Wilcoxon signed rank test should be fine, or even the paired t test with 24 subjects.

#### Karabiner

##### TS Contributor
if I measure a specific variable (likert scale) on Day D,
Do you really mean Likert scale (an instrument which consists of several Likert-type items),
or just one Likert-type item? In the latter case, maybe this should be treated as an ordinal
scaled measurement, and you use the sign test.

With kind regards

Karabiner

#### SimonP

##### New Member
Do you really mean Likert scale (an instrument which consists of several Likert-type items),
or just one Likert-type item? In the latter case, maybe this should be treated as an ordinal
scaled measurement, and you use the sign test.

With kind regards

Karabiner
Hello,

The item I am measuring is supposed to be an ordinal "likert scale" (a scale from 1 to 7 evidencing the pain; where 7 = pain at rest, 6 = pain when walking; 5 = when walking up the stairs, ....).
What would you say in this case ?

I found that "If X and Y are quantitative variables, the sign test can be used to test the hypothesis that the difference between the X and Y has zero median, assuming continuous distributions of the two random variables X and Y, in the situation when we can draw paired samples from X and Y"

Considering a likert-scale is not a quantitative variable, the sign test would not be appropriate, would it ?

Thank you

Last edited:

#### SimonP

##### New Member
McNemar and chi2 tests are for count data. It sounds like you need a paired sample test. The Wilcoxon signed rank test should be fine, or even the paired t test with 24 subjects.
A paired sample test or wilcoxon would only work to compare the effect that one variable (qualitative) has on another variable (quantitative). Here, I think both measures are qualitatives. Hence, my question

#### SimonP

##### New Member
You have a quantitative variable here, which was measured twice within the same sample.
Why do you assume it is qualitative?

And if you want to know whether there is an effect of "time", then this effect is included
in the difference between the measurements at t1 and t2. Hence, paired samples t-test
or Wilcoxon signed rank test for dependent samples seem appopriate. McNemar's test is
for repeated measures of a dichotomous variable.

With kind regards

Karabiner
Hello,

To me, "Likert scale" is an ordinal qualitative variable....not a quantitative variable.

Let's say you want to follow a training session, and I ask you afterwards how efficient you think the training was, with the following options :
1- not efficient
2- slightly efficient
3- efficient
4- very efficient

This is qualitative, not quantitative. The numbers are just there to reflect the "level of efficiency", but "not efficient" or "very efficient" is quality, not quantity.

You cannot "add" two levels of efficiency, neither make a mean and say that "if 10 people answered "1" and 10 people answered "4" then i can take a mean of "2"".

In my example, saying that "you feel pain at rest (7)" or "you do not feel pain at rest but when walking (6)" is showing the quality of the pain. But the difference of "1" between the level 6 and 7, is not the same difference of "1" between 5 and 6.

Is that making sense ?

#### Karabiner

##### TS Contributor
The item I am measuring is supposed to be an ordinal "likert scale" (a scale from 1 to 7 evidencing the pain; where 7 = pain at rest, 6 = pain when walking; 5 = when walking up the stairs, ....).
This has not the least to do with Likert or Likert scale (which is a measurement instrument composed
of several Likert-type items), or a single Likert type item, or a Likert item response scale. If you
use the term "Likert scale", then you refer to an interval scaled measurement.

But regardless of the labeling, you have an ordinal scaled variable which was measured twice within the
same sample. If you want to know whether there was an effect of time, then this effect is included in the
difference between the ordinal scaled measurements at t1 and t2. The test for this difference is the sign test.

A paired sample test or wilcoxon would only work to compare the effect that one variable (qualitative) has on another variable (quantitative). Here, I think both measures are qualitatives.
I must confess that I do not know what you mean. A paired samples test does not do what you describe,
and both measures are not strictly qualitative, but ordered categorical (rank variables).

With kind regards

Karabiner

Last edited:

#### katxt

##### Well-Known Member
Having seen your explanation of the scale, I agree with Karabiner. I would go with the Wilcoxon sign test.

#### Karabiner

##### TS Contributor
Not exactly. The Wilcoxon signed rank test requires interval scaled variables,
because the magnitudes of differences are compared. For a true ordinal scale
such as that presented here, I'd say the sign test is suitable.

With kind regards

Karabiner

#### SimonP

##### New Member
This has not the least to do with Likert or Likert scale (which is a measurement instrument composed
of several Likert-type items), or a single Likert type item, or a Likert item response scale. If you
use the term "Likert scale", then you refer to an interval scaled measurement.

But regardless of the labeling, you have an ordinal scaled variable which was measured twice within the
same sample. If you want to know whether there was an effect of time, then this effect is included in the
difference between the ordinal scaled measurements at t1 and t2. The test for this difference is the sign test.
Hello,

I am afraid that the studies published in scientific litterature would disagree with your opinion and that this scale definitely has to do with a "Likert scale".
It is even called the "7-Point Likert Scale of Lower Limb Muscle Soreness" and was published in Clinical Journal of Sport Medicine (full reference : Impellizzeri FM, Maffiuletti NA. Convergent Evidence for Construct Validity of a 7-Point Likert Scale of Lower Limb Muscle Soreness: Clinical Journal of Sport Medicine 2007; 17(6): 494–496.)

I must confess that I do not know what you mean. A paired samples test does not do what you describe,
and both measures are not strictly qualitative, but ordered categorical (rank variables).

With kind regards

Karabiner
If this is true, then I would welcome some additional explanation.
If I go back to the basics of how I was taught statistic, then the very first thing to define - in order to choose a test - is :
1) to define the independant (X)/dependant (Y) variable
2) to define if these variables are qualitative/quantitative

- If X and Y are both quantitative, then a Spearman/Pearson correlation is used
- If X is quantitative and Y qualitative (or vice-versa) then there are several options (depending on the normal/non-normal distribution; the fact that the groups are paired/not-paired; the fact that the variances are equal/not equal, the fact that there are 2 groups or more than 2 groups, ...)
- If X and Y are both qualitative, then depending on the case it would either be a Fisher test, Mc nemar test, Chi2 test or Kappa test.

Do you agree with this ?

A paired sample test, such as the "student t test" or the "Wilcoxon test" is a test allowing to compare two set of data when the independant variable is qualitative and the dependent variable is quantitative. Let's take the following example to explain my point :

If I have a group of 60 people, that I then :
- measure their heart rat
- split the groupe in two (A and B). A being the test group and B being the placebo group
- give the pill to group A and nothing to group B
- measure again the heart rate of all 60 people the next day

I can perform a statistical test to evidence "the effect that this pill has had on the heart rate" and therefore see :
- if "there is a statistical difference before/after taking the pill for group A" (and see if there is a stastistical difference between day1 and day 2 for group B) as well as
- if "there is a statistical difference between group A and group B on the next day".

In this case, either the people will/will not take the pill (the independant variable is qualitative) and the measured variable (dependant variable) is the heart rate (quantitative).

I will therefore here have measured the effect that the independant variable has on the dependant variable (i.e., I will have measured the effect that the pill has on the heart rate), which means that :
- if I make an intragroup comparison I will have to use a "paired t test" (or wilcoxon, for non-normal distribution); and
- if I make an intergroup comparison, I will have to use a "welch test" or "t student equal variance" test (or "mann-whitney test" for non-normal distribution

The reason these tests are chosen is very much linked to the type of variable (quantitative or qualitative).
If, for example, I wished to see the link between two quantitative variable (e.g., the link between the weight of a mouse and the length of its tail), then I would need to perform a Pearson (or Spearman) correlation.
And if I wished to see the link between two qualitative variable (e.g., the link between taking/not taking a pill and being sick/healthy), then I would most likely need to use a Chi2 test.

There is nonetheless a HUGE difference between the example mentioned above and my original example.
In the original example, the 7-point scale (called "likert scale" by those who developed it) is a qualitative scale (not quantitative).
It therefore means here that the measured variable is qualitative, not quantitative.

I hope this explains better my viewpoint

Last edited:

#### Karabiner

##### TS Contributor
It is even called the "7-Point Likert Scale of Lower Limb Muscle Soreness" and was published in Clinical Journal of Sport Medicine (full reference : Impellizzeri FM, Maffiuletti NA. Convergent Evidence for Construct Validity of a 7-Point Likert Scale of Lower Limb Muscle Soreness: Clinical Journal of Sport Medicine 2007; 17(6): 494–496.)
Well, a single item cannot be a Likert scale (although many people confuse the response scale of the item with
the complete scale), and calling the answering format of that item a "Likert" format seems quite strange, given the
clear and simple definition of the Likert item. But anyway, the main problem now is not its labeling, but
which scale level it has and how it was used in the present study.
There is nonetheless a HUGE difference between the example mentioned above and my original example.
In the original example, the 7-point scale (called "likert scale" by those who developed it) is a qualitative scale (not quantitative).
It therefore means here that the measured variable is qualitative, not quantitative.
The huge difference is that you do not have 2 groups, therefore the example is not helpful.

If I understand you correctly, you want to find out whether there is a change with regard
to the 7-point measure between first and second assessment. So you can opt for using
the measurement as ordinal scaled (which makes very much sense in my opinion, since
I cannot see how the difference between for example "pain at rest" and "pain when walking"
can be proven to be exactly the same as the difference between "pain when walking" and
"pain when walking up the stairs") and use the sign test. Or, if your sources say otherwise,
and your reviewer accepts that, then you could treat the variable as interval scaled, and
use dependent samples t-test or the signed rank test.

With kind regards

Karabiner

#### katxt

##### Well-Known Member
Or a permutation test perhaps? kat