# Disagreement among statisticans

#### noetsi

##### No cake for spunky
Probably should put "statistian" in quotes here because most of these people use statistics I would guess but have terminal degrees in fields other than statistics.

I suspect Jake and Dason will say I should test these differences myself (although given that experts disagree commonly and presumably do this already I am not optimistic about this approach given that I am no expert).

This is a link that reflects, by poster's who's expertise I have no way to access, the differences I often find in text or discussions of statistics on topics. Some say Kruskal Wallis is testing 1) medians 2) means 3) distribution (if ranks are the same effectively).

http://en.wikipedia.org/wiki/Talk:Kruskal–Wallis_one-way_analysis_of_variance

I have found similar disagreements among academic sites on this topic.

Last edited:

#### noetsi

##### No cake for spunky
Perhaps this goes to the real point [having said that I am uncertain the authors I have read or this references would agree with the following].

There is considerable confusion in the literature over this matter. Some authors state unambiguously that there are no distributional assumptions, others that the homogeneity of variances assumption applies just as for parametric ANOVA. The confusion results from how you interpret a significant result. If you wish to compare medians or means, then the Kruskal-Wallis test also assumes that observations in each group are identically and independently distributed apart from location. If you can accept inference in terms of dominance of one distribution over another, then there are indeed no distributional assumptions.
http://influentialpoints.com/Training/Kruskal-Wallis_ANOVA_use_and_misuse.htm

I am not entirely certain what dominance is [references to it as for instance here don't define that term]. It would be nice if the authors who disagree addressed the points the author above makes But at least in what I have seen they do not usually. I suspect details like this involve many "disagreements." That said in my own experience in academics over several decades academics commonly just don't agree on what reality is. Not simply fail to detail technical points fully.

I don't know if this is true of statisticians or not.

#### noetsi

##### No cake for spunky
I meant this thread to be a bit silly reflecting a frustration I often have as a non-statistician trying to do statistical analysis. But it has a serious dimension given that statistics is used in a wide range of analysis that effects people's lives. Usually by people who are not formally statisticians (that is their degree in not from a statistics department).

Several of the examples we found in the literature failed to meet even the basic assumptions of random sampling and independence. In one case Kruskal-Wallis was misused for repeated measures on the same patients - the non-parametric Friedman test would have been perfectly adequate or (following transformation) a paired t-test. The test is also not appropriate for comparing observations in a time series, or for observations where there is spatial autocorrelation - although we look at one way of coping with the latter problem. Pseudoreplication is often present - we look at one example where slugs are treated in groups of ten, yet in the analysis each slug is treated as an independent replicates.
They listed other serious problems besides this such as interpreting KW as a difference in means when mean data was hardly reasonable.

I read once a reivew of medical research in elite medical journals using logistic regression. Serious mistakes were encountered in the methods in those journals. Yet these analysis undoubtedly were used to inform key decisions on surgery, drugs, treatment etc

#### Miner

##### TS Contributor
I reviewed my copy of Nonparametric Methods for Quantitative Analysis by Jean Dickison Gibbons. The test is actually comparing differences in mean ranks. If the distributions are constrained to be the same shape, typically one of the assumptions for the Kruskal-Wallis test, this is equivalent to comparing the medians. However, if the shapes differ (e.g., variance, skewness, etc.) it is no longer equivalent and you are only comparing mean ranks.

#### noetsi

##### No cake for spunky
Which is what the last author I cited suggested as well [although he argued that you could use it to actually compare means as well in some special cases]. The problem is that various authors [all from academic sites] don't make these distinctions. They say it can be used for (depending on the author) for means, medians, or ranks and fail to explain [in both my experience and the experience of the author I cited above] the additional assumptions this requires.

It isn't clear to me that they would agree these limitations exist since they do not address them. Statistics is one field where qualifying assumptions are critical, but the nature of journal articles and much in the academic realm does not lend itself to going into detail in such issues. You simply assume everyone knows and that there is no variation in usage of a method among researchers or practisioners.

Something my own experience in academics would not support. There is a lot of variation... My guess is that the non-statisticians who use statistics may not even be aware of these issues. They probably never come up in the classes they take on statistics or the casual literature.

It would be interesting to know what percent of those who do analysis in statistics, in either academics or as practisioners, took their statistics in statistic departments. I would guess only a small percent do.

#### noetsi

##### No cake for spunky
this author is apparently a real statistician [as compared to playing one in another academic setting]. This is part of his comments:

So, Mann-Whitney U test assumes the equal variances (homoscedasticity) and the different variations of two populations affect results of the test. It has been noted for a long time in statistical books (for references, see the paper shown in below). However, unfortunately, there are some (not rare) examples where authors wrote that they used Mann-Whitney U test because of unequal variances (!).
lol

http://kasuya.ecology1.org/stats/utest01e.html

This of course conflicts with the author I cited above [who made sense to me] who argued that equal variances is not required if you are only looking at this test in terms of rank dominance.

#### Dason

##### Ambassador to the humans
This of course conflicts with the author I cited above [who made sense to me] who argued that equal variances is not required if you are only looking at this test in terms of rank dominance.
Was the author talking about testing for a difference in medians though?

#### Miner

##### TS Contributor
Which is what the last author I cited suggested as well [although he argued that you could use it to actually compare means as well in some special cases]. The problem is that various authors [all from academic sites] don't make these distinctions. They say it can be used for (depending on the author) for means, medians, or ranks and fail to explain [in both my experience and the experience of the author I cited above] the additional assumptions this requires.
I saw where this author stated that if the distributions were symmetrical in addition to having the same shape/equal variances that you could extend the test to means. I think the author bases this on the notion that when depicted graphically a symmetrical distribution appears to have the same mean and median. However, this is an artifact of histogram binning. The mean and the median may be very close, but it would be unusual for them to be identical. This is a difficult leap to make.

#### Miner

##### TS Contributor
This of course conflicts with the author I cited above [who made sense to me] who argued that equal variances is not required if you are only looking at this test in terms of rank dominance.
I would agree with this. While dominance might not be a precise term, it does express the concept of mean ranks rather well.

#### Dason

##### Ambassador to the humans
I saw where this author stated that if the distributions were symmetrical in addition to having the same shape/equal variances that you could extend the test to means. I think the author bases this on the notion that when depicted graphically a symmetrical distribution appears to have the same mean and median. However, this is an artifact of histogram binning. The mean and the median may be very close, but it would be unusual for them to be identical. This is a difficult leap to make.
If you actually do assume that the populations are symmetric (and that the means exist) then there is nothing wrong with it. It adds another assumption to the test though.

#### Dason

##### Ambassador to the humans
I would agree with this. While dominance might not be a precise term, it does express the concept of mean ranks rather well.
Dominance sort of is a precise term - and it's actually what the test is testing (not mean ranks - the mean ranks is just the test statistic).

#### noetsi

##### No cake for spunky
Was the author talking about testing for a difference in medians though?
I am not sure which author you mean here. The original author I cited argued that equal variance is not required if all you were doing is a test of dominance [rank structure or distribution]. The second author simply stated: "So, Mann-Whitney U test assumes the equal variances (homoscedasticity) and the different variations of two populations affect results of the test. "

He does not say you have to have equal variance in some cases, for some purposes, he simply says it is required [which logically means it is always required]. He may have a specific purpose that is required for and not others, but no where does he suggest this. In fact reading his comments I got the exact opposite sense, that he always felt it was required.

Which is a central point of this thread. Authors may mean that an assumption is only needed in using a method for a certain purpose, but they don't say this - at least in the cases I am citing. So the reader is likely going to assume it is required generally based on reading this [I have no way from the link to tell if the second author felt equal variances could sometimes be ignored. As noted they never suggest this].

#### noetsi

##### No cake for spunky
I saw where this author stated that if the distributions were symmetrical in addition to having the same shape/equal variances that you could extend the test to means. I think the author bases this on the notion that when depicted graphically a symmetrical distribution appears to have the same mean and median. However, this is an artifact of histogram binning. The mean and the median may be very close, but it would be unusual for them to be identical. This is a difficult leap to make.
A really interesting point. At times I have assumed that ordered data could be interval in nature, and thus a mean calculated, but I do it based on whether it makes sense substantively for this to be true. Or at times that there was no reason not to assume this since statistics using means are so much easier to use in the software and methods I know

#### noetsi

##### No cake for spunky
What is statistical "dominance"? In honesty I never ran into this term before today One more thing to learn.

I assume it means one distribution is above another on whatever they are being ranked on.

#### noetsi

##### No cake for spunky
I was thinking of just how many books, articles, and links would have fits with this statement.

Standard t-tests and ANOVA methods are frequently thought to be sensitive to the assumption of normality. That really isn’t true. Central limit results of various types are such that the standard parametric approaches are reasonably robust to departures from the assumption of normally distributed errors.
of course the disagreement would probably what "reasonably robust" was and what it meant in practice. Non-parametrics was created in part because researchers felt that non-normality was a signficant problem for ANOVA.

That being said, it should be added that linear rank tests may be more powerful than their parametric counterparts under certain distributional assumptions. What can be disastrous for standard parametric procedures is the existence of outliers. That is to say that while they are robust, they are not generally resistant.
I have encountered this point before. I have a book that argues that even a small number of outliers could totally distort the results of ANOVA and GLM generally and that larger sample sizes would not address this. That is asymptotic methods would not eliminate the problem.

http://www.wuss.org/proceedings09/09WUSSProceedings/papers/anl/ANL-Hobbs.pdf

#### CB

##### Super Moderator
Lots of interesting stuff here.

I understand your concern, noetsi. I have seen a lot of postgrad students (in psychology) get frustrated when trying to do quantitative projects, because everyone seems to be giving them different advice.

Some points I would make, though:

1) Statistics is an active field of study, just like any other! We do not not expect psychologists or physicists or biologists to agree on everything. Why would we expect statisticians to?

2) Scientific evidence is not obtained by a show of hands. Looking at the general consensus in a field is fine as a very rough heuristic when we're looking at a field that we know nothing about. But in general, if you want to work out the truth about the world, you need to worry less about what others believe, and more about what the evidence or arguments backing up their beliefs are.

3) In statistics we’re particularly lucky because the evidence for claims about statistical theory is particularly transparent. I.e., claims are justified by mathematical theorems and/or simulation studies. So anyone can directly access the evidence that backs up a claim (unlike some other fields, where you have to rely on reports from other researchers about esoteric experimental procedures).

4) That’s not to say that understanding the evidence backing up a statistical argument is easy! But it can be done, particularly for simpler claims.

5) So for example in the case of the claim you mention, take the following R code:

Code:
X1 = c(1:99)
X2 = c(-100:-52, 50, 50, 50, 51:97)
median(X1); median(X2) #both variables have the same median (50).
kruskal.test(x = list(X1, X2))
#But the Kruskal-Wallis test rejects the null hypothesis, p = 0.001
Since the Kruskal-Wallis test gives a significant result despite the two samples having exactly the same median, it obviously is not a test of the medians being equal (at least not without adding additional assumptions).

#### rogojel

##### TS Contributor
hi,
just my five cents : from a practical point of view this does not seem to be a big problem imho. If I compare two groups the Kruskal-Wallis could tell me whether values in a group tend to be greater then the values in the other group and this would be the information I need (my practical interpretation of the dominance) . Under certain assumption this can be formulated more concisely as " the median of the first group is less then the median of the second" but most of the times the first looser formulation should be enough, I guess.

regards
rogojel

#### noetsi

##### No cake for spunky
I agree with rogojel's point that in practice dominance, which all agree Kruskal-Wallis measures is enough for what I do. This thread was really not about what it measures, but a frustrating issue to me (as a non-statistician) that you can read different sources on a wide range of statistical questions and get totally different answers. As a non-expert it is difficult for me to be sure who is correct in those cases. I don't really have enough confidence in my own grasp of theory, or simulation or for that matter matrix algebra that drives me crazy, to think I will ever be able to decide myself who is right.

Statistics is an active field of study, just like any other! We do not not expect psychologists or physicists or biologists to agree on everything. Why would we expect statisticians to?
Because to me statistics is math and a science. I expect there to be real answers that can be definitively decided. Perhaps that is an unrealistic expectation (after all physicists are still trying to decide what light is as they have for the last half century at least). I came from academic fields, administration and political science where disagreements about basic issues is very common and essentially going nowhere (look at the organizational motivation or leadership field if you have any doubts on that point or what leads to conflict for that matter).

I really expected statistics, math based and a science to be different. But I am less sure of that any more. It appears to be much like other academic disciplines.

#### Dason

##### Ambassador to the humans
I really expected statistics, math based and a science to be different.
The professor that taught my masters level methods course emphasized that statistics is both math and art. He even went so far as to divide the room into two sides and would move from side to side as he switched from talking about the solid facts (math) and the art (the more subjective parts of actually practicing statistics).

The methods that you use are all mathematical. If the assumptions are met then x, y, and z will be true. The issue is that you don't seem to care about that stuff. You're talking about what to do in practice. And in practice nothing is cut and dry - it's up to you to decide if the assumptions are or aren't met. This is where there tends to be disagreement because you're out of the realm of math and stepping into the realm of the art of statistics.

When it comes to the disagreements you've been mentioning here where you aren't even sure about the assumptions - that is because of a few reasons. 1) People are idiots - get over it 2) Sometimes you might miss (or not understand) a few crucial details or assumptions that were made that can influence what is being discussed 3) You're reading different articles where some people are operating under the assumption that you're only discussing what happens in theory and some other articles where they use simulation studies or empirical results to give rules of thumb about what might be done in practice. 4) Did I mention that people are idiots? Also some people learn from idiots and then pawn off the wrong facts as true and might even mangle them even more in the process.

So yes statistics is cut and dry and there always is a clear cut answer IF (and this is a big if) everybody agrees on what assumptions are reasonable in every given situation and they agree on what modeling techniques are best.

You know a good way to decide for yourself if a given method is appropriate in a certain situation? Simulation

#### noetsi

##### No cake for spunky
I am working on learning simulation and bootstrapping. It is absolutely true that as a non-statistician, statistics is of interest to me purely for practical reasons. To decide substantive issues through statistical methods. My own research is in other areas, that rarely uses quantitative approaches. And as I have mentioned before I don't believe my own math skills will ever be good enough for me to understand the theory behind statistics.

I am not sure that the disagreements reflect idiocy [although I am struck on how strong the disagrement is on some issues for example the use of likert data and the assumptions behind linear regression which in practice is used broadly in areas many statisticians feel is invalid for linear approaches]. It is striking to see how statistics is used by different fields and the strong disagreement by field. Statistics is unusual in that it is broadly used in research and practice by individuals who are not trained statisticians [I am certainly one although formally I have a graduate degree "applied statistics" taught by non-statisticians for the most part]. As quantitative studies grew over the last half century much of statistics ended up being done by and text written by non-statisticians. I think that cause much of the disagreement.

But even in the hard sciences, where this is not generally true, there remain strong disagreements. Somethings are beyond our knowledge at present. And it is certainly true that while in theory if certain things hold you will get certain results, those things rarely hold in the real world analysis I suspect. I read one author who argued that normal distributions were essentially unknown in real world data. Lastly it might be noted that academics by its nature encourages people to disagree and challenge existing knowledge. You don't get published or promoted by saying Dr. Smith was right. You get published by saying he was not right, or what he said did not matter because this other approach is preferable. And I think the people who come into academics generally are argumentive and challengers of reality (if not just stuck up) and that matters on such issues as well

Hey I am criticizing myself there as I spent much of my life in academics