About box plots...

Sigh. Working with a client who wanted benchmarking data on one of our surveys we did. So, we went to an external provider who has access to lots of survey data. He did the comparison for us, but because he uses surveys with the same question but on different scales, what we got back is basically a sort of table version of a very big box plot.

Here's my issue. He presents it with quartiles and a standardized score (1-100). And as you might expect, there are varying sizes to the quartiles. What it looks like he did was to get counts of observations for each point on the scale. So for example, even though the bottom quartile of the scale is 0-25, there are more observations there because more of the population scores in that range than the other quartiles.

My problem is that my client is not "getting" this. Because when you look at how quartiles are explained out in the world, it's not a question of the scale being parsed...it's the scores. Thing is, if you do create quartiles just from the range of scores you don't get much variation because, well, you're just quartering them and you get 25% or so in each. So it HAS to be some combination of parsing the scale, then getting counts of each point on the scale that total up to create your quartiles and your plot.

Does this make any sense? Does anyone have a better way of explaining this? They're so frustrated with it, and I can't say I blame them. But I can't dumb it down any more than I have, and this is the only way to give them what they want (benchmarking data).



Less is more. Stay pure. Stay poor.
How having them visualize it with a histogram, so they can see the distribution and understand why their approach is less ideal. Also, if the data are clustered, having them report the median with Q1 and Q3 (which I am unsure if they have done) may help show why their approach is going to mottle things, a probability density function graph will also be revealing.

So do you plan to standardize your values as well?


TS Contributor
They are probably not providing what you think, but are actually providing something commonly used by survey companies where they typically will use 5 buckets and Top box/Bottom box scores.