I am looking at a small pilot study (n=16) of subjects whose heart rates are being measured as part of stress physiology. Heart rate is the continuous outcome variable (/min).

The subjects go through 2 possible situations. My hypothesis is that the 2 situations are similarly stressful, and I anticipate that the heart rates increase similarly in both situations.

All subjects go through both situations. The heart rates are not necessarily normal distribution. The catch is that the heart rates don't start at the same median values for each situation (e.g. Situation A starts them at HR 70 +/- 8 while Situation B starts them at HR 90 +/- 8), but they are predicted to increase by a set amount. So I can't actually compare the starting HRs or ending HRs between the 2 situations as they are definitely different already and their difference is not due to the situation.

What I have set up so far is

**a Wilcoxon signed Rank sum test**as an alternative to a paired t-test, using the DELTA-Heart Rate (d-HR) as the outcome variable. I calculated d-HR by subtracting the peak HR - starting HR for each subject's trial. I only have 1 trial for situation A & 4 trials for situation B. I can't increase the # trials for situation A, unfortunately, because of its rarity.

The problem is that the Wilcoxon is really looking for difference against a Null Hypothesis that they are similar. I actually would like to prove that they are similar, and I don't think I have enough subjects to do an actual Equivalence study. So currently I am essentially "failing to show a difference" with a high p-value and calling it a day.

This method feels tacky or shrimpy to me. But I can't think of any other way to arrange the inferential analysis. Short of constructing an equivalence study, are there other ways to determine similarities in heart rate increases between 2 situations?

Thanks for your thoughts.