Nonparametric alternative to point-biserial correlation coefficient? - Which? t-test or Mann Whitney?

#1
A. I know from the net that for a design with one binary variable and a second variable that is continuous but is NOT normally distributed, I can use BOTH the point-biserial correlation (which is basically the parametric Pearson correlation formula) as well as the Rank Biserial Correlation (which is equal to the nonparametric Spearman or Kendall τ correlations).

B. I also have read about linear and monotonic correlations, which implies that even the Pearson coefficient (and of course, point-biserial) is OK for nonnormal distributions.

C. And I understand that with one binary variable and a continuous one, it might even be better to use an independent-samples comparison test (e.g., unpaired t or Mann-Whitney U) instead of a correlation coefficient.

D. I know that the Mann-Whitney test will yield the same p value as the Spearman coefficient, while the t-test will give the same p value as the Pearson coefficient (and point-biserial).

E. I know when the groups are not large enough AND when the error terms are not normally distributed, I should use the nonparametric Mann-Whitney instead of t-test.


Two Questions:

The above assumptions cause some inconsistencies and confusion in the following case:

I am analyzing this design with 2 groups of 20 patients each; the independent variable is Treatment (the treatments A or B) and the dependent variable is the continuous Length measured in each group. The latter is NOT normal (the groups and the error terms are all NON-normal).

The problem is that in this particular design, the nonparametric Spearman and Mann-Whitney tests yield a statistically significant p value, while the parametric point-biserial [Pearson] and t-test yield a quite non-significant p value > 0.1.

Question 1. Which one should I use? The nonparametric Spearman / Mann-Whitney? Or the parametric point-biserial [Pearson] / t-test? On the one hand, the assumption E dictates that I must use the nonparametric Mann-Whitney. On the other hand, the assumptions A and B allow me to use the parametric point-biserial [which is actually Pearson] correlation and by extension the t-test. So what should I use?

Question 2. The assumption E seems to be in total conflict with the assumptions A and B: The results of Spearman / Mann-Whitney are identical, and so are the results of point-biserial [Pearson] / t-test. So if I am allowed to use the point-biserial [Pearson] in the absence of normality (assumptions A and B), why not the t-test which gives the EXACT SAME result as point-biserial?
 

Karabiner

TS Contributor
#2
Question 1. Which one should I use?
It is a bit unpleasant to make such a choice after the p-values are known. - You could
maybe choose according to what you want to compare, means (which are often heavily
influenced by outliers in case of "time" variables), or ranks.
So if I am allowed to use the point-biserial [Pearson] in the absence of normality (assumptions A and B), why not the t-test which gives the EXACT SAME result as point-biserial?
Do the source which "allow" the point-biserial refer to the statistical test of significance,
or do they just state that the coeffcient is a valid expression of the association, regardless
of the distribution of the variable?

By the way, n=40 would usually considered as large enough sample to allow a t-test,
or preferably: a Welch-corrected t-test.

With kind regards

Karabiner
 
#3
You could
maybe choose according to what you want to compare, means (which are often heavily
influenced by outliers in case of "time" variables), or ranks.
Thanks for the nice answer and advice. I think in this study (and actually most other studies on continuous data), the means are much more important than the ranks.
Can you direct me to any references for this advice? Some books or articles? I need to cite this reference to justify my test in use because we have always been told that regardless of the parameter of interest (means or ranks), we should always use nonparametric tests when the sample is small and nonnormal.

It is a bit unpleasant to make such a choice after the p-values are known.
I understand. Before learning the assumptions A and B in my original post, I would not bother at all to contemplate the test in use. I would select the Mann-Whitney or Spearman right away without thinking twice. But the problem started when I learned that it is strangely allowed to use Pearson / point-biserial on small, non-normal data. Why? Because they basically give the same p-value that the forbidden t-test gives.

Still, your advice to select the parameter of interest (means versus ranks) can help solve this dilemma. I hope there is some reference for it out there.

But as stated above, this is not what they had taught me in all these years. They had always told me to use a non-parametric test on small, nonnormal data regardless of the parameter of interest (means versus ranks). And now with learning these new STRANGE assumptions about the Pearson / point-biserial correlation (the assumptions A and B above), I am quite perplexed. If the results of parametric and nonparametric tests can differ so much in the case of nonnormal data, how can they both be allowed on nonnormal data?



Do the source which "allow" the point-biserial refer to the statistical test of significance,
or do they just state that the coeffcient is a valid expression of the association, regardless
of the distribution of the variable?
As far as I have noticed, I think the sources do not make a distinction between these two statistics (the correlation coefficient [the extent of association] versus the coefficient's p-value [the statistical test of significance]).

I myself think these two (the effect size [r or the correlation coefficient] and the statistical significance [the p-value]) are intermingled and inseparable. One cannot talk about one of these two without being indirectly talking about the other one, I think.

But if I am mistaken, please let me know and be so kind to elaborate or possibly give some references. Thanks a lot for walking me through.

And if you agree with me on these two being two sides of one single coin, then let's think about why such self-contradictory assumptions have been placed for t-test versus Pearson, of course if you are interested.



By the way, n=40 would usually considered as large enough sample to allow a t-test...
1. Do you think the central limit theorem holds because the total sample (both groups combined = 40) is greater than 30?
2. Or do you think so because each group has 20 data points, which is a number close to 30? [I think this is the case]
3. In any case, can you point me to some reference too?

I always thought that the CLT kicks in when each group is as large as a minimum of 30 specimens. This particular study has only 20 data points per group.

... , or preferably: a Welch-corrected t-test.
Sure, and thanks for reminding me. I fully agree about a Welch-corrected t-test being better than a Mann-Whitney U. But a Welch-corrected t-test does not give a p-value similar to the p-value of Pearson and is hence not of my concern right now. My problem is with the simple, normal t-test which yields p-values exactly similar to the Pearson coefficient, causing all this conflict and controversy (i.e., while Pearson is allowed on non-normal data, t-test (which gives the exact same result) isn't allowed).
 

Karabiner

TS Contributor
#4
Can you direct me to any references for this advice? Some books or articles? I need to cite this reference to justify my test in use because we have always been told that regardless of the parameter of interest (means or ranks), we should always use nonparametric tests when the sample is small and nonnormal.
This is just my daily practice, I am afraid. In the present situation, I would assume that the sample
does not need to be considered small (total n > 30), therefore a t-test/Welch test could be justified.
I myself think these two (the effect size [r or the correlation coefficient] and the statistical significance [the p-value]) are intermingled and inseparable. One cannot talk about one of these two without being indirectly talking about the other one, I think.
I cannot quite follow, I'm afraid. A correlation coefficient, or a mean difference etc. are what
they are. Whether assumptions are fulfilled which refer to the statistical significance test, is
a different matter. Therefore my question, whether your sources might talk only about the
calculation of the parameter, not the test.
1. Do you think the central limit theorem holds because the total sample (both groups combined = 40) is greater than 30?
n=30 is what I have often seen presented as sufficient sample size, and it was sufficient in some
simulations I have seen.
2. Or do you think so because each group has 20 data points, which is a number close to 30?
To test the difference between 2 means, and this difference is based on n=40 observations.
A t-test is 100% analogous to a oneway analysis of variance with 2 groups, or a simple linear
regression with a dummy predictor, and in both cases one would consider just the 40 residuals.

With kind regards

Karabiner
 
Last edited: