Similarity instead of differences in means b/w 2 groups

Is "failing to find a difference" an ok proxy for looking for a similarity?

  • It's absolutely good statistics. No problem!

    Votes: 0 0.0%
  • Not great...but it's widely done.

    Votes: 0 0.0%

  • Total voters
    4
#1
Hi - long time lurker. First time poster. I'm hoping folks can provide guidance and opinions.

I am looking at a small pilot study (n=16) of subjects whose heart rates are being measured as part of stress physiology. Heart rate is the continuous outcome variable (/min).

The subjects go through 2 possible situations. My hypothesis is that the 2 situations are similarly stressful, and I anticipate that the heart rates increase similarly in both situations.

All subjects go through both situations. The heart rates are not necessarily normal distribution. The catch is that the heart rates don't start at the same median values for each situation (e.g. Situation A starts them at HR 70 +/- 8 while Situation B starts them at HR 90 +/- 8), but they are predicted to increase by a set amount. So I can't actually compare the starting HRs or ending HRs between the 2 situations as they are definitely different already and their difference is not due to the situation.

What I have set up so far is a Wilcoxon signed Rank sum test as an alternative to a paired t-test, using the DELTA-Heart Rate (d-HR) as the outcome variable. I calculated d-HR by subtracting the peak HR - starting HR for each subject's trial. I only have 1 trial for situation A & 4 trials for situation B. I can't increase the # trials for situation A, unfortunately, because of its rarity.

The problem is that the Wilcoxon is really looking for difference against a Null Hypothesis that they are similar. I actually would like to prove that they are similar, and I don't think I have enough subjects to do an actual Equivalence study. So currently I am essentially "failing to show a difference" with a high p-value and calling it a day.

This method feels tacky or shrimpy to me. But I can't think of any other way to arrange the inferential analysis. Short of constructing an equivalence study, are there other ways to determine similarities in heart rate increases between 2 situations?

Thanks for your thoughts.
 
#2
The problem is that the Wilcoxon is really looking for difference against a Null Hypothesis that they are similar. I actually would like to prove that they are similar, and I don't think I have enough subjects to do an actual Equivalence study. So currently I am essentially "failing to show a difference" with a high p-value and calling it a day.

This method feels tacky or shrimpy to me. But I can't think of any other way to arrange the inferential analysis. Short of constructing an equivalence study, are there other ways to determine similarities in heart rate increases between 2 situations?

Thanks for your thoughts.
As far as I know, you need the equivalence study if you want to have any measure of statistical reliability attached to your inference. Failing to find a difference and saying they are similar is not at all valid and is a common bad move perpetuated by many. It's not minutiae or "technicalities." I'll offer a simple example but direct you to this paper, points 4-6,8. If a 95% CI for the mean change in HR is (-5,+5) this would indicate nonsignificance at the 5% level on a two tailed test where the null hypothesis is Ho: delta mu HR = 0. The logic other put forth is that failing to find significance means there is no difference in the groups and the CI supports this because 0 is in the interval. This is a mistaken understanding of the methods and the output. If the CI contains zero so we conclude no difference, why can we not also conclude the difference is -4, -3, -2, -1, 1, 2, 3 ,4...anything between and including (-5,+5)? The center of the interval is not necessarily the value most compatible with the data, for example, so we quickly see that many more hypotheses may explain our observations and therefore, we cannot conclude no difference by failing to reach significance. Finally, it is a logical fallacy to assert that an absence of evidence constitutes evidence of an effect "absence."


You're very correct in your statement that the above method is flawed and outright incorrect. You're going to need an equivalence study, otherwise, the most you can conclude with frequentist testing is that there is insufficient evidence of a true mean difference in HR between the populations at the chosen alpha level, or that there is sufficient evidence that there exists a true mean difference in HR between the populations.
 
Last edited:

Karabiner

TS Contributor
#3
There is the simple yes/no-approach of the NullHypothesisSignificance Testing approach.
A Bayesian approach could deliver an estimate of the "situation" effect, as well as a credible
interval. Of course, both approaches cannot compensate for too sparse data.

With kind regards

K.
 

hlsmith

Omega Contributor
#4
First of all, thanks for posting Todd.


Before I make any recommendations, I think we need to double check exactly what your data structure looks like. You have 16 subjects and this is observational, you are just examining what has already happened in a clinic?


"subjects go through 2 possible situations." So all 16 patients get both tests or not? Is this the rest-test, no-rest test, or something like that, where patients failing first test will go on to the second? Provide a sample of what your data actually looks like, you can make up the values if you want.


Also, are you going to need to control for baseline differences in patients at all, since I am not sure if randomization or crossover design was used, or order of tests matters.


P.S., I am wondering if there is an equivalency permutation test out there that may work. Also, changes in values can be trickier than thought at times.
 
#5
I would like to clarify that I meant only positive comments and I apologize if my post came off as negative towards Todd in anyway... only intended to help!