# RESOLVED but still open for discussion: When to transform and when to use non-parametric tests?

#### lucyd123

##### Member
Hi Amazing Forum of wonderfully statistically minded people.

I recently did some statistical analyses on cattle trials, and was considerably helped by this forum. I used non parametric testing for non normal data. However, my supervisor says (as a rule) it's better to transform data and use 'more common' parametric tests (such as glm) than use non parametric tests?

I've attached what I sent my supervisor, and I hoped some of you people could take a side. Have I made a mistake? And if I choose to transform for parametric testing, why (so I can explain)?

Best wishes

#### Miner

##### TS Contributor
Here are my two cents worth (note: my background is industrial statistics):
• You lose information when you transform data
• Non-normal data sets are often the result of mixtures or from an underlying process that changes over time. Trying to transform these is a mistake.
The usual argument for transforming data is that the parametric test has more power than the equivalent non-parametric test. However, that increased power is often small and with reasonable sample sizes is often irrelevant. Another argument from the slide rule era, was transforming data would allow you to use linear regression models instead of nonlinear models. Since we no longer use slide rules, that is a rather outdated argument.

#### hlsmith

##### Less is more. Stay pure. Stay poor.
@lucyd123 - there is no attachment.

I may differ a little from @Miner this time. I would normally go for transformations first. Transformations can result in more interpretable results for people, given you interpret them correctly. Skewed data are common and not necessarily the results of multiple process or at least well defined processes (e.g., costs and say economic growth). And parametric procedures can be robust to minor normality deviations. You can control for covariates in GLMs and there is more flexibility if you slide over to generalized estimating equations (GEEs).

#### Karabiner

##### TS Contributor
I recently did some statistical analyses on cattle trials, and was considerably helped by this forum. I used non parametric testing for non normal data
How data is distributed is most often irrelevant for testing. Instead, the distribution of the
residuals from your statistical model can be of concern.

Transformations should not be done just to achieve a distribution (of the residuals) which is suitable
for a certain statistical test, IMHO. They should make some real sense (distributions of some
entities such as income/wealth or reaction time are often better described on a
logarithmic or exponential scale).

With kind regards

Karabiner

#### noetsi

##### Fortran must die
I don't know which is more technically correct, but in economics its well accepted to use say logs with skewed data. I assume if its that common in a highly advanced field methodologically there must be a good reason to do so...

Its not only violations of assumptions that are involved. Sometimes transforming data , makes interpretations easier.

#### lucyd123

##### Member
Wow ok, so it sounds like it is dependent on many factors. In the attached document I have tried first non parametric and then square root transformation followed by parametric testing on the same dataset. Sounds like there is not a right or wrong answer!

#### Attachments

• 351.8 KB Views: 6

#### Karabiner

##### TS Contributor
The right answer is that you need not be concerned about normality for a t- test,
if sample size is large enough. Seemingly, your total sample size is > 200, which
certainly would be sufficient.

With kind regards

Karabiner

#### GretaGarbo

##### Human
But you can not replace a missing value with zero. (If I refuse to tell you my length it does not mean that it is zero.)

Code:
#change    NA’s to     0's
my_data[is.na(my_data)]    <-‐ 0
It seems like most of your data are positive values. To impose zero values will increase the skewness (and it will also be a sort of fabrication. Sorry).

#### noetsi

##### Fortran must die
But you can not replace a missing value with zero. (If I refuse to tell you my length it does not mean that it is zero.)

Code:
#change    NA’s to     0's
my_data[is.na(my_data)]    <-‐ 0
It seems like most of your data are positive values. To impose zero values will increase the skewness (and it will also be a sort of fabrication. Sorry).
Yes. Multiple imputations is the best way to address such.

#### GretaGarbo

##### Human
Yes. Multiple imputations is the best way to address such.
I think that it is better to simply leave the missing values as missing, and to not use then in the testing.

#### noetsi

##### Fortran must die
I think that it is better to simply leave the missing values as missing, and to not use then in the testing.
The problem is that can eat up your cases very quickly. I lost 7 percent of my cases with just 3 variables. And some argue if results are missing your results will be invalid, the MNAR issue.

#### GretaGarbo

##### Human
Then we ask the original poster: why were there missing values?

(MNAR = missing not at random)

#### CamilleJosion

##### CaJosion
Transforming is worth it, if reaching a special condition (for example, normality) helps increasing the quality of your analysis (reducing error type I or II) or checking specific hypotheses (for example you do linear regression even with totally non normal data, but then you loose the possibility to test several hypotheses). Otherwise non parametric is fine. Or Monte Carlo or bootstrap approaches are also very good. Re transformations, I like to use the Johnson transformation (you can find it in Minitab, XLSTAT, R).

#### lucyd123

##### Member
Then we ask the original poster: why were there missing values?

(MNAR = missing not at random)
Thank you for this observation. Actually this is an error in the data entry, where the person entering the data from my notes confused a 0 with not putting anything into the excel sheet! All parameters were measured (I measured them).

#### lucyd123

##### Member
Also: Thank you for all this amazing advice and insight. There is a lot here that had never even occurred to me! Best wishes, L