just my 2 cents.
First, do not consider MW test as a test of medians. That holds true under specific conditions.
Secondly, are you planning to do those tests as a sort of predictors selection? I do not think it would be a sound approach.
Well that first histogram is extremely skewed and I would not use a ttest on it. Think about when to use the ttest, maybe also when you are comfortable reporting the mean and 95% confidence interval. Would you be comfortable doing that with the variable in the first histogram? I did not see any Q-Q plots?
You can use the chi-sq test with 2x2 tables - that is completely acceptable. The ANOVA requires 3 or more groups.
I've checked the specific conditions for the MW test to be a test of medians and its o.
The histograms are have more or less the same shape, but i would like to do a t-test if the results are valid and interpretable offcourse.
Besides doesnt he CLM theorem declares that if your sample is sufficiently lare enough you can always use a t-test?
Yes, the t-test is robust to deviations from the usual assumption (symmetry and constant variance) but it is not robust to outliers. And from the histograms it looks like there are outliers in the upper and lower ends of the scale. These will be very influential observations.
The first histogram looks very skewed. Maybe by taking logs (several times) or the square root (several times) it can be made more symmetric. The last one looks more like double exponential (Laplace distribution) than normal. But even there there are some outliers.
Otherwise I agree with the OP that CLT would take care of the problems but the outliers make me hesitate.
But, if the purpose is to run a logistic model and use these variables as explanatory variables then it does not matter. There are no distributional assumptions on the x-variabels i regression model, logistic or not.
Well, i know the MW test is not a test of medians but under certain conditions it is and those conditions are satisfied with my variables.
I cant do any transformations (square root or log transformation) because that left skew you see is mostly for the value 0 (companies having no Financial debt or supplier debt) and for the other variables i have some negative values which aint good either if i read this correctly. as you know log and square root of 0 is undefined.
In the explanation from i attached (see pic) prescribes why you need to run a t-test.
In a lot of similar papers like mine the t-test is examined but i never see them state that their variables are normally distributed.
They even run the t-test for categorical variables which i dont understand.
I also winsorized for the 1th and 99th percentile so the impact of outliers should be mitigated no?
i'm going to use the t-test and refer to your second paper Greta, i think this should be ok (correct me if im wrong).
I would like to thank all of you already for your responses and help.