Reviewer feedback - In the statistical section, why log transformation were not done for normality and equal variance

Hi stat friends! Can I please get some feedback/a sense check on my statistical approach based on reviewer comments of a scientific paper I am submitting? Is there anything really dumb in here or anything crucial I'm missing out? Thank you all for your wonderful help!

Here is an excerpt from my statistical section:

All data were checked for normality (Shaprio-Wilk) and variance (Levene’s test). To test for (cattle) browsing effect on foliar biomass(of trees) within species, the averages were compared prior to- and post- the cattle trial. When data were not normally distributed and (i) had comparable variances a Mann Whitney test was used (Species 1, Species 2), (ii) did not have comparable variances a Mood’s Median test was used (Species 3) [21].

I recently got feedback on a manuscript I submitted. "In the statistical section, why log transformation were not done for normality and equal variance test before using Mann Whitney test/Mood's Median test?”

Here is my response, I've attached exploratory plots of the data for Species 1 (Levene's test P>0.05, i.e. variances between samples are not different):

Non parametric testing for Species 1 and Species 2:
  • The data chosen for non parametric testing demonstrated a strong ‘left skew’.
  • The data contained a majority of ‘0’ values, as the cattle completely defoliated the majority of the trees.
  • Exploratory tests were used as follows: The Shapiro Wilk considered the data to be non-normally distributed and the Levene’s test was used to check for homogeneity of variances and found the variances were not different.
  • The data met the assumptions of the Mann-Whitney-Wilcoxon test (wilcox.test with continuity correction) which assumes (1) the samples come from distinct populations, (2) the samples do not effect one another and (3) the populations have similar shapes of distribution and similar variances.
  • We did also conduct explorations into transformation of the left-skewed data using logarithmic and square root transformations. Neither transformation resulted in a Gaussian distribution. The square root transformation was the one which most closely resembled a Gaussian distribution, and a Welch two sample t test found a significant differences in the means the samples (P<0.01)
Non parametric testing for Species 3:
  • In the case of Species 3 the same process as that described above was followed. The post-browsing data contained a majority of ‘0’ values, as the cattle completely defoliated the majority of the trees. Transformation would have been of limited use
  • The results of the Levene’s test were P<0.05 so the data were not homogenous and therefore a mood’s median test was used.
Ultimately, the decision to follow the non-parametric testing was carried out based on the following considerations
(a) It was felt that the median was a more appropriate representation of the data than the mean due to outliers.
(b) The outliers and zero-values were valid results and were included in the analysis.
(c) Despite being a weaker statistical test, the assumptions of the non-parametric tests were met and the results of testing represented the observed change.

Does this sound ok?



Ambassador to the humans
I typically disagree with doing tests for normality or equal variance before running models but that's a different discussion.

I disagree with your last point (c) in that non-parametric tests are weaker. They are really only weaker when all of the assumptions of the parametric test are met perfectly. And even then it's not *that* much of a difference in power. I would personally remove the "despite being a weaker statistical test" portion of your response.

With all of that said... the non-parametric tests you mention I believe are invariant to monotonic transformations (which the log transformation is). You can give that a try too - just log transform the data and see if you get the same p-values. The route of least resistance would be to just do that to shut the reviewer up but if you want to have some honor you might fight back at them by pointing out that the log transformation wouldn't make a difference in your case. If the data is more interpretable to you in the original form then it is my belief that you should keep the data in that form and not just transform it to make it seem slightly better for some test (which it wouldn't even do in this case).
You can give that a try too - just log transform the data and see if you get the same p-values. The route of least resistance would be to just do that to shut the reviewer up
thank you so much for your feedback! I really appreciate your comments. I did indeed do transformations and subsequent parametric testing, but in the end I reported the non parametric results because i didn’t feel justified in employing transformations for data that I felt was better suited to non-parametric analysis. I will remove the part about it being weaker. I really don’t feel like changing my approach because someone has questioned by (what I felt was justified) use of non parametric tests, particularly as I’m not sure if they understood the rationale or just want me to explain my reasoning (which is understandable). Conversely, lots of scientists don’t use non parametric tests because of their added complexity. I don’t know much but I’m trying to learn so thank you for your contribution


Less is more. Stay pure. Stay poor.
I think your rebuttal is great. I would agree with @Dason's comments and add, that your a priori analytic protocol was to use the nonparametric tests - thus you should stay true to that.

I love quantile regression, which would also be a great fit - given you have enough data for the confidence intervals to be informative.

Yes, when you said you had left-skewed data and zeros, logging isn't great in that scenario. Another option, but likely not necessary would be to do a permutation test, which I believe performs fine under skewed data. But I am not sure you can get an estimate out of it beyond a pvalue - which is why I don't mess with it.

P.S., This is all given you randomized animals to treatment groups and have exchangeability between groups. If not you could do multiple quantile regression.

You feeding these cattle sea weed yet? :) It would be great if the gov could subsidize the cost to supplemental it into their diets.


No cake for spunky
It depends in part on sample size. If its reasonably large normality is relatively unimportant (and of course power may not be a major issue either if its large enough). I don't understand why you would test for normality or equal error variance before you run a model although I know that gets done. If its the normality of the residuals that matter, why analyze the normality of anything else. If your method has residuals anyhow.

Statistical tests for normality have serious issues. QQ plots are a lot better.

Regardless of that, unless you think the results will be wrong I would just do what the reviewer says. Even if their request makes little sense. When all you have is a hammer you use that all the time. Like me the reviewer probably learned something was important a long time ago and sticks to it even if the literature has moved on. I actually have not seen non-parametric tests in a long time.

A warning is that I am not (unlike hlsmith and dason) and expert in statistics. I only know what I see in the literature which they tell me is often wrong. :)