Non-experts doing statistic

I was thinking of me in this case. :p I tried this on another thread where people no longer post. So I moved it here.

This is the type of concern that I have a lot.
Using simple examples, we show that unmodeled effect heterogeneity in more than one structural parameter may mask confounding and selection bias, and thus lead to biased estimates. In our simulations, this heterogeneity is indexed by latent (unobserved) group membership. We believe that this setup represents a fairly realistic scenario—one in which the analysts has no choice but to resort to a main-effects-only regression model because she cannot include the desired interaction terms since group-membership is unobserved.

I am fairly certain that if this problem occurs I would not capture it. I check for the basic violation of assumptions, but only those. And of course that is one of many many violations I have run across in my reading. Which raises the critical question, is it useful to run statistical models when you are not truly a statistician. Because you may generate bad results due to issues like this and never know it...


Less is more. Stay pure. Stay poor.
What non-statisticians don't know they don't know is never known. You just hope people's data are large enough and the methods robust enough. But unfamiliarity of biases to me is the real issue. Anyone can run a test and look up assumptions. We have all seen this. But garbage in is garbage out! The true enlightenment is when one can comprehend that almost every single biases can be converted into a missningness problem (e.g., missing data, confounding, information bias, and selection bias, maybe even chance (existing or finite sample issues)..
Unknown unknowns are to be feared. :p

The real problem here, as a non-expert, is whether I should even be running statistics given that I am a non-expert (and given a lot more than a decade invested in this and graduate classes in statistics, I am doubtful I will ever be an expert). How can you be ever sure your results are accurate given all the things you don't know? There are so many methods that essentially say, that way of doing statistics generates wrong results. For instance SEM implicitly says regression is wrong because it does not consider indirect effects. And MLM says much of statistics is wrong for failing to consider clustering factors (threats to statistical independence).

But the article is actually saying the experts themselves are wrong (or anyhow those who see themselves as experts). Which is actually worse.


Less is more. Stay pure. Stay poor.
It is not well defined since the causes can be varied. I usually think of it as an interaction or moderation effect. But some times you don't have the both interaction terms, it is latent.


Less is more. Stay pure. Stay poor.
@noetsi is your main issue in stats just doubt? Time series seems hard, but most of the others are straightford after you get enough repetitions in. You have data and time at work I believe, just play around with the basic and annotate you code. Heck, I don't have much time anymore but I will write a couple of methods abstracts a year to conferences to force myself to apply some new things. Does your job allow you to go to conferences? If so, submit something so you have to commit to a method and master it. Stats is so broad we can't learn everything, but that is the fun as well. We get to continuously learn.
Its doubt I am competent to get the answers right given the types of issues the author raised (and there are many more). I don't want to give people bad results.

I guess the issue is, can one validly do (should one do) regression (etc) if one is not an expert. Or is the possibility of biased answers too serious. My job, which is 95 percent SQL does not allow me to go. I just read lots of books and articles and annoy people in blogs. :p

With time series there is no choice. They want something so something has to be done. And I think so little is known of time series that there are few true experts. :p