I need guidance on figuring out a linear regression model from a dataset

hlsmith

Less is more. Stay pure. Stay poor.
#2
Seems pretty straight forward if you know how to fit a linear regression. The causal question may be based on whatever you are covering in class, but it seems they are referencing unconfoundedness.
 

noetsi

Fortran must die
#3
I agree with this statement...
If correlation does not prove causation, what statistical test do you use to assess causality? That’s a trick question because no statistical analysis can make that determination.

You might want to look at the quote from the 65 Hill article on what is required to prove causation. But this goes beyond I suspect what you are asking for. https://statisticsbyjim.com/basics/causation/

also this https://towardsdatascience.com/causal-vs-statistical-inference-3f2c3e617220

Most likely they mean the direction of the effect is from Y to X (which includes that Y occurs before X). They also mean that X is measured without error.

although that is a guess on my part. Its not really a good question.
 

hlsmith

Less is more. Stay pure. Stay poor.
#4
Well it could be a great question, we dont know, since we dont know what they are covering in the course! Yeah the Bradford-Hill Criteria are a good place to start. There are some statistical elements that can be examined as well, positivity and exchangeability given known and collected confounders.

@noetsi, did you mean X occurs before Y (temporality)? Models typically cannot discern the direction of time when looking at two variables in a cross-section of time. However some advance methods can speculate which variable causes the other in time series data, I believe loosely based on the second law of thermal dynamics - everything moves toward entropy.
 

noetsi

Fortran must die
#5
Yes I meant X occurs before Y. In theory you should be able to see a change in X occurring before a change of Y if x causes Y. You can look at the lags of X to see if indeed this happens (and later changes in Y). This does not prove causality. It is required for causality. So its necessary but not sufficient.

Or that is what I have read. I am not sure you could do this in fact with real data since normally both are measured at the same time.