Association/ correlation binary and continuous data non-normal distribution

#1
Hi,
I want to perform an association analysis between a binary (dichotomous?) variable and continuous variable. The binary variable is whether something is present or not, "Yes" or "No"/ "1" or "0" .... etc. The continuous variable are numbers between ~150 and 170~. The continuous variable is not normally distributed. There are more low than high values. My question of interest is whether there is a correlation between either high or low values of the continuous variable and the 1 or 0 of the binary variable. So, do low values correlate with "1"? My sample size is ~150

I have tried a point biserial correlation test and a sperman's rho test so far. I'm not sure if any of them is the right one. Can someone give me an advice on this?

Many Thanks!
 

Karabiner

TS Contributor
#2
Compare the means of your continuous variable between the "yes" and "no" group (t-Test, or rather Welch test).

With kind regards

Karabiner
 

hlsmith

Not a robit
#3
Yeah, I was going to propose the Wilcoxon rank sum. Is your continuous variable bound between ~150 and 170~ or was that just where most landed?
 
#4
Thank you!

The continuous variable is bound between ~150 and 170. If this is is problem I can change the numbers to ~0-20 but I don't think it is.

A t-test and a point biserial correlation test are basically the same thing, is that right? By applying a t-test I get really low p-value for every case I'm testing.
 

hlsmith

Not a robit
#5
I was checking with the bounding, because if a continuous variable is bounded, then many times you can get confidence intervals that span a greater range than is allowable, e.g., say 99% bound by 100% and 95% CIs are 94% to 109%, which may be non-sensical (sp?).

What are you trying to say with the results. Also, can the continuous variable be non-integers, e.g., 156.89?


Given your data, I would think an exact (monte carlo) Wilcoxon rank sum test would be appropriate. There is another person on this forum that would likely also recommend perhaps a permutation test, based on say the t-test framework.
 
Last edited:
#6
the continuous variable are days of the year. The event whose occurence I am tesing based on condition "0" or "1" can occur approximately between 150 and 170 days after January 1st. Does this mean I have ties to my data? It can also be floats as I am also working with the mean value over several years.
With the result I am trying to tell if condition "1" leads to lower values of the continuous variable. So if condition "1" occurs, wether the event I am testing occurs earlier.
 
#7
T-tests are used to determine if there is a difference between groups. For measures of association between a continuous and a dichotomous variable, point biserial or crude odds ratio from logistic regression can be used. However, point biserial is the same as Pearson's r and therefore same assumptions should be satisfied (normality, no outliers, linear,etc). I suggest to use crude odds ratio from logistic regression.
 

Karabiner

TS Contributor
#8
T-tests are used to determine if there is a difference between groups. For measures of association between a continuous and a dichotomous variable, point biserial or crude odds ratio from logistic regression can be used.
If there is a difference between groups regarding the continuous variable,
then there is an association between the grouping variable and the continuous
variable. The mean difference or the standardized difference (Cohen's d) could
be meaningful measures of this association.

With kind regards

Karabiner