effect size to report or not

trinker

ggplot2orBust
I challenged a prof today and I said that an effect size should not be reported for non significant results. He strongly disagreed.

What are your thoughts (I'm ok if I'm wrong but I want to know why). If anyone has specific sources to cite that would be awesome.

spunky

Can't make spagetti
what's your rationale for not reporting them if the results are not significant? i've never been an overall enthusiast of effect sizes myself but i guess it wouldn't hurt to provide some extra info to the reader of your research beyond p-values...

trinker

ggplot2orBust
for not: they could influence people to think there's something there when there's not. It plays to post hoc power calculations. It's an excuse for not fnding significance. Besides if you've reported what you're supposed to the effect size can be calculated by anyone from your write up.

trinker

ggplot2orBust
Here's my evidence for why we shouldn't report an effect and it's misleading:

Code:
> y <- c(rnorm(19, 2), 100)
> z <- rep(c("c", "t"), each=10)
>
> dat <- data.frame(y=y, z=z)
>
> mod <- lm(y~z, data=dat)
> summary(mod)

Call:
lm(formula = y ~ z, data = dat)

Residuals:
Min      1Q  Median      3Q     Max
-12.567  -9.376  -1.696   0.153  88.434

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)    1.743      6.962   0.250    0.805
zt             9.823      9.846   0.998    0.332

Residual standard error: 22.02 on 18 degrees of freedom
Multiple R-squared: 0.0524,     Adjusted R-squared: -0.0002484
F-statistic: 0.9953 on 1 and 18 DF,  p-value: 0.3317

> anova(mod)
Analysis of Variance Table

Response: y
Df Sum Sq Mean Sq F value Pr(>F)
z          1  482.5  482.46  0.9953 0.3317
Residuals 18 8725.4  484.74
>
> #cohen's d
> diff(sapply(split(dat$y, dat$z), mean))/sqrt(anova(mod)\$Mean Sq[2])
t
0.4461571
The effect is not significant when the standardized mean difference (Cohen's d) is of a medium sized effect. The outlier leads us astray.

Karabiner

TS Contributor
Effect sizes pertain to populations.

Strictly speaking, one cannot
report an effect size from sample data.
Sample "effect sizes" are smaller
or larger than true effect sizes.
In case of small to moderate sample
sizes, the error may be considerable.

But it might be nice for those
who perform meta analyses or
write review articles to provide
them with such "effect size" figures.

With kind regards

K.

SmoothJohn

New Member
for not: they could influence people to think there's something there when there's not. It plays to post hoc power calculations. It's an excuse for not fnding significance. Besides if you've reported what you're supposed to the effect size can be calculated by anyone from your write up.
Those are odd reasons. For the first one, why are you trying to do your readers' thinking for them? It's even odder when you pair it with a reluctance to perform a calculation for your readers.

I'm not saying anything about the original question, but I am baffled by the quoted reasons.

John

trinker

ggplot2orBust
SmoothJohn you're baffled but this is a common argument we do all the time in statistics. e.g. Don't use a broken scale as it's misleading. Would you say that pie charts are ok and 3-d broken bar grapgs are fine because the reader can figure out the meaning? To me a write up is about being clear; end of story. If you say one thing and report a follow up what message are you sending?

You're statement "why are you trying to do your readers' thinking for them" is irresponsible at best if we realize we aren't writing for ourselves (or shouldn't be) or fellow researchers. Generally, we're attempting to influence an applied field or policy. Frankly I don't care that much if a fellow researcher can compute an effect size for himself. I'm in education, so I care if a teacher or administrator or policy maker reading my study can make decisions based on the article. If not I've failed in the write up. If we are writing to impress other researchers we probably shouldn't be accepting all those grants we get.

Think about who really impacts policy. It's joe smo the politician or jane jolly the news reporter who reads that article and reports what they think your findings are. I would argue we need to be aware not only of the intent of our writing but but also be on guard against how it may be misused. Think of how many times you read a reprt of a study in the newspaper or hear about it on the radio and then read the actual study (journal article) and see it's actually advocating the opposite recommendation than the journalist has disseminated or the methods are seriously lacking.

My argument of not supplying an effect size because it can be computed by someone else is also sound on the basis of not wasting ink or the time of your reader. Do we report every last step of our mathematical calculations? Heavens no we report what is necessary and of importance. If the main effect is not significant why report something that is not necessary or of importance. If it's to help out our pals the meta analysis guys that argument is unsound for 2 reasons a) they're not dumb and can calculate it themselves so why throw off those readers who might not know statistics b) they're not dumb and meta folks assume that who ever wrote the study wrote it wrong will compute the effects sizes regardless of whether or not they're reported.

We already have people who skip over the methods section and head right for the results without questioning things like validity etc. They just want a p value. It seems an effect size gives this type of person even more freedom to pick and chose the story they tell regarding our research in that now they can skip over that pesky p value or confidence interval and move right on to an effect size to say hey there's an effect here. The common reader hears effect and never thinks population vs. sample. Worse yet is apa 6 doesn't require (and it usually isn't given) confidence intervals for an effect size and thus it's difficult to make judgments about it.

spunky

Can't make spagetti
uhmmm... but i think this is more an issue of data screening rather than misinterpretation of effect sizes. i could just as easily create an example where a single point is responsible for the significance of your statistics... actually, even better, why not use a well-known-one: the anscombe quartet. but the moral of the story of the anscombe quartet is to aways check your data before you do anything else. i think the same point can be made here. borrowing from one of Jake's quote: think first, regress later.

my guess is the rest of the points you make are a matters of good practice instead of not reporting effect sizes. maybe i am being naive here, but i'm assuming that whoever reads an article where both non-significant results are presented alongside with effect sizes has a good-enough understanding of null hypothesis testing (NHST) that they can still interpret them cautiously. say for instance several research articles show large effect sizes but small sample sizes. that creates a precedent for new researchers to look into it and maybe say "uhmm... it seems like this research design is optimal but we need a larger sample size. let's try that out and see what we can find". or maybe it creates a precedent for people noticing that there is something 'wrong' with the way the data is being collected and analysed. like in your case, an outlier.

maybe i'm just generally against not giving people stuff. give them all the info they need and let them sort things out themselves. i'm sure it'll generate interesting dialogue and the best ideas will (hopefully) survive in the end...

Lazar

Phineas Packard
I think there is no harm in reporting effect sizes. You indicate you are worried it can lead to a false impression that an effect would have been significant if the sample size was bigger. However, it can also clarify just how small an effect you have. This is particularly the case when you have many effects on different scales some of which are significant.

In addition, there are some fields that require it (APA journals) or for which effect size presentations are the standard. Think that in research with Likert scales we often present standadized betas to a point where many do not realise they are a form of effect size. Further, in some methods like propensity score matching, in the propensity score estimation stage we don't care about significance only effect sizes

victorxstc

Pirate
I challenged a prof today and I said that an effect size should not be reported for non significant results. He strongly disagreed.

What are your thoughts (I'm ok if I'm wrong but I want to know why). If anyone has specific sources to cite that would be awesome.
I think the experts here, including yourself, have said already everything by now. But before reading their and your valuable comments, let me say my words and then check if it was correct or not.

I think a significant P value is a risky thing, when it has the potential to become non-significant by increasing from 0.0499999 to 0.0500001, and when that can happen by slight decreases in sample size or slight increases in variation.

So I don't rely on the P value for deciding on reporting or ignoring effect sizes. I personally report considerable effect sizes, even if my P value is about 0.1 or sometimes even it is a little higher than 0.1 (when alpha is only 0.05). As a matter of fact, I agree with those people who say a P value is a misleading thing at least in many occasions (especially when P values indicate non-significance but due to a lack of power) and should be replaced totally by effect size measures. Therefore, I never sacrifice the effect size for a P value which is affected by sample size and its interpretation is affected by being simply less or beyond 0.05. I always respect effect size measures and always report them unless the effect size itself is small and/or the P value is pretty large (in the latter case, the effect size is usually small too).

trinker

ggplot2orBust
I see you point victorxstc but when you say:

should be replaced totally by effect size measures
That's risky too in that the two items are giving us different information. I can see more abolishing p-values in favor of confidence intervals but replacing them can lead to conclusions like in the R example I gave above.

I would say that I conclude (from everyone's thoughts here) effect sizes provide valuable insight and should be included but the writer's language has to be clear about what they are and mean in relation to the probabilistic nature of the p-value/confidence interval.

spunky

Can't make spagetti
we could all avoid these difficulties and opt for Bayesian inference

remember people: Bayes leads the way (or at least it does until i'm done with my thesis. i'll pro'lly switch back to factor analysis after that)

SmoothJohn

New Member
And just to be less provocative trinker, I think that the discussion section of a paper is an excellent place to take care of those places where you are concerned with misleading your readers.

trinker

ggplot2orBust
Oh type I errors how I wished you didn't exist.

I remember when I first started into statistics after a start in qualitative research and I was so happy to have truth on my side now. Everything was decided by numbers and researcher bias didn't exist. What happened to that guy?

Thank everyone for a good debate and reasonable arguments. You've actually convinced me that effect sizes are important additional information. You've actually made me a stronger advocate for confidence intervals as well. I'm not ready to say **** the p-values full speed ahead yet but I feel enriched. Thank you.

victorxstc

Pirate
I see you point victorxstc but when you say: ...
That's risky too in that the two items are giving us different information. I can see more abolishing p-values in favor of confidence intervals but replacing them can lead to conclusions like in the R example I gave above.

I would say that I conclude (from everyone's thoughts here) effect sizes provide valuable insight and should be included but the writer's language has to be clear about what they are and mean in relation to the probabilistic nature of the p-value/confidence interval.
I agree that these can and should be both used together, and that a careful interpretation is the key. But even in that case I still think some modifications should be made to that 0.05 thing. [detailed in the comments below]

-----------------------------------------------------

for not: they could influence people to think there's something there when there's not.
Does non-significance necessarily mean that "there is not something"? I think non-significance can be simply an artifact of low power and there are real situations that adding one specimen to a group of n = 8 can turn the results from non-significant to significant (my personal experience in a couple of occasions). I think (and argued last year) that we should treat the P value as a continuous value which can be strong or weak within a continuum (something similar to effect sizes), rather than a binary notion of significant/non-significant. Should science rely on a notion that can be altered totally by adding one specimen to a group of n = 8? Or should it rely on a more flexible concept of a continuous thing?

It plays to post hoc power calculations. It's an excuse for not fnding significance.
I think "finding the statistical significance" should not be the goal of scientific research, as clinical and practical significance can be independent from it at some or many instances. I find for example that this treatment is 3 folds more effective than the other one (P = 0.067). Should I totally neglect such a nice result, just because when the P value was introduced for the first time, the creators were in the mood of 0.05 (95% confidence) (as some rounded value) and now my P is 0.017 greater than their mood? I think I should accentuate that this result is indeed extremely worthwhile and future research with better samples and control for the confounders should verify it. And this is when we still accept the notion of the P < 0.05. Otherwise, I think I can say "we found with 93.7% confidence that our treatment might be 3 times more successful than the other one"...

Besides if you've reported what you're supposed to the effect size can be calculated by anyone from your write up.
Most of the time it is i think. But there are situations that effect size is not calculable. Besides, I have seen meta-analyses in which the authors have simply criticized that effect sizes were not reported in some studies (but those criticizing authors have not tried to simply calculate those effect sizes themselves for their own meta analysis!). So telling the effect size might be good or at least harmless, in my humble opinion. Besides, it can save the time of those readers who wish to know it, while it does not take too much space.

spunky

Can't make spagetti
Think about who really impacts policy. It's joe smo the politician or jane jolly the news reporter who reads that article and reports what they think your findings are. I would argue we need to be aware not only of the intent of our writing but but also be on guard against how it may be misused. Think of how many times you read a reprt of a study in the newspaper or hear about it on the radio and then read the actual study (journal article) and see it's actually advocating the opposite recommendation than the journalist has disseminated or the methods are seriously lacking.
p-values and effect sizes aside this is something that has actually bothered me for a while. it seems to me that research doesn't really influence policy unless its results are consistent with joe smo's or jane jolly's agenda. part of the whole Chicago's school district strike that happened last month involves the use of certain dubious statistical methods (called value-added models) to assess teacher's performance. most psychometric research about such value-added models is, at best, inconclusive (and this is putting it mildly ). nevertheless, they're starting to be used more routinely now and they can have a very negative impact on people's lives... for both students and teachers alike. unfortunately nobody listens to the research that questions the validity of these methods... they just... well... get used.

effective1

New Member
A discussion (with numerous references) of whether or not statistcal significance should precede estimation of an effect size appears on pages 9 - 11 in

Grissom, R.J., & Kim, J. J. (2011). Effect sizes for research: Univariate and multivariate applications (2nd ed.). New York: Routledge.

trinker

ggplot2orBust
The prof whom I originally debated has this book (Ironically they cite him in the book ). I think they did a decent job of capturing the debate we're having here. Thanks for the reference. I think I'm in the report the effect size even if there is no significance camp now.

hlsmith

Less is more. Stay pure. Stay poor.
In agreement, the effect size can help readers examine under or over powered studies. I can use a national databank and show significance all over the place, but the effect size can help the reader interprete clinical significance or not.