# How do you know if X is moving Y in non-experimental data.

#### noetsi

##### No cake for spunky
Hey I got tons of points for this question on another board so it must be important right? Well it is extremely important to me who am a data analyst not a statistician. This is central to what data analyst who rarely have theory to build on (I have read a lot in my field, I have never seen an article on this in generic VR literature. I fall back on the motivation research I last read 25 years ago).

This is not about causality per se, I know that correlational studies can not show that. But we will never be able to do random assignment, it would not even be legal. So correlation is what we have. My problem is that I am having growing doubts about the use of regression to show that X causes a change in Y. For example in our survey of satisfaction we measure a wide range of predictors of satisfaction, like satisfaction with pay, and then use it to predict overall satisfaction. Satisfaction with pay (a dummy predictor variable) has the highest odds ratio in the survey (the DV is a two level variable satisfied/not - I am running logistic regression). If you are satisfied with pay you are 19 times more likely to be overall satisfied than if you are not satisfied with pay. The odds ratio suggests that this is a key factor in overall satisfaction therefore at least compared to 30 other predictors.

The problem is that there is little indication this really matters. 90 plus percent of our staff are satisfied (that the number is so high might be part of the problem because there are only 47 usable cases that are dissatisfied out of 448 total cases - also we have 31 predictors). And dissatisfaction with pay is extremely high. So pay looks like a key driver of overall satisfaction based on the odds ratios, but satisfaction with pay is very low and overall satisfaction is quite high. It seems that while overall satisfaction reflects real differences between those who are satisfied with pay and not, those differences don't change whether customers are satisfied or not. Which confuses me given my understanding of regression.

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Please list out all three questions exactly how they are presented. Then create two 2x2 tables stratifying participation in the tables by the variable that is plaguing you. Present these data.

If all data are collected cross-sectionally on the same instrument, you will always be limited. This is why the social sciences can have underlying issues at times.

#### noetsi

##### No cake for spunky
I am not sure what you mean by three questions. I am talking about one IV and one DV. Although the IV is part of a linear regression.

The predictor is satisfaction with pay (satisfied/not) a dummy variable.

The DV is overall satisfaction coded the same way.

I am not certain what you are suggesting I do but would be happy to do it. It is certainly true these were collected at the same time on the same instrument.

#### hlsmith

##### Less is more. Stay pure. Stay poor.
You confused me with your prior description. What is conflicting or plaguing you then?

#### noetsi

##### No cake for spunky
That an X appears based on the odds ratio to have a big impact on Y. But, for reasons I stated above there is external evidence its impact on Y is not that great. So what bothers me, plaguing is so harsh, is how you really can know if X is driving Y or the results are coincidental.

#### hlsmith

##### Less is more. Stay pure. Stay poor.
You use alpha to understand coincidental associations. Then present the 2x2 contingency table for these two variables, so we can see this, please.