How to Determine if 2 variables are dependent?

#2
You dont need a test if you just want a basic idea. If you look at the plots of citation flow vs trust flow you can decide for yourself. It appears in most of those plots that as citation flow increases, so does trust flow, therefore it would appear, at least at first sight, that these two variables depend on the other.
 
Last edited:
#3
How about checking for a bivariate correlation? In SPSS, Analyze -> Correlate -> Bivariate. If your two variables are correlated they will have stars beside them. If you don't have the raw data, then @duskstar's answer is your best bet.
 
#4
But there a few higher citation flow values that yield lower trust flow scores, so I wasn't sure. If I wanted to back up my claim that both variables are truly dependent, what test/method should I go about? Assuming that the relationship is linear (which I strongly believe it should be) then a coefficient of determination and coefficient of correlation would certainly come in handy. Anything else?
 

hlsmith

Less is more. Stay pure. Stay poor.
#6
R can typically represent the correlation between two variables, so yes the R^2 is similar since it is the square. However, you probably don't need to use the R^2, just the R. Present the R (standard error) and its p-value and you should be fine for showing a statistically significant correlation as long as there is not another variable that effects these two variables (confounding/ Simpson's Paradox). Look-up correlation topics such at Pearsons (~normal continuous) and Spearman (not ~ normal continous) for the right test.
 
#7
Warning: Correlation coefficients are biased in the presence of heteroscedasticity.
Forbes & Rigobon (2002). You may be interested in their heteroscedastic-robust correlation coefficient estimator.
 

noetsi

Fortran must die
#8
Another possibility is to regress one on the other and see if the residuals indicate heteroskedacity. If so there are corrections depending on the exact form.

R squared values show how much (in a bivariate model) the IV explains in the DV. There is no definition as far as I know of independence, other than formally having a R squared of zero (which is unlikely in a real world problem even by chance). So you would decide if substantively the r squared suggest independence.
 
#10
Ok thanks for the help!

Now, if I were to try figuring out which website is the most trustworthy (based on high citation and trust flow) from the data, how would I go about doing that?

I'm assuming that citation and trust flow are independent because there is not a strong correlation between them.
 
#11
do u mean "multicollinearity"...?
(I'm a spss user) as far as i know, multicollinearity checking only possible in linear regression. after defining your dependent variable and independent variable, click "statistics" - checked in "collinearity diagnostic".

in the output regression there will be additional columns "Tolerance" and "VIF" (Variant Inflation Factor). How to interpret the result? u can choose booth or one of them. for example: if we will use VIF then if the number is close to 1 then there is no correlation between independent variables, if VIF value getting bigger then those variables dependent with one/more others independent variable
 

noetsi

Fortran must die
#14
This isn't true.
How can you have zero explained variation and dependence? The definition of independence is that two variables are not related (normally measured by 0 correlation). That is what happens when you have a r of 0, r squared will be zero.
 

Englund

TS Contributor
#16
But Dason, R^2 is equal to the squared correlation coeficient in a linear regression with one IV. R^2 can be calculated in other cases aswell, while r cannot. It is not necessarily true that there are no dependence when r=0, but if R^2=0 then there are per definition no dependence between the variables.

If r=0 and we have a dependent relationship, the simple linear model is an incorrect model.
 

Dason

Ambassador to the humans
#17
... maybe I'm not following what you're saying but...

Code:
> x <- seq(-10, 10)
> y <- x^2
> o <- lm(y ~ x)
> o

Call:
lm(formula = y ~ x)

Coefficients:
(Intercept)            x  
  3.667e+01   -5.121e-16  

> summary(o)

Call:
lm(formula = y ~ x)

Residuals:
   Min     1Q Median     3Q    Max 
-36.67 -27.67 -11.67  27.33  63.33 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.667e+01  7.498e+00    4.89 0.000102 ***
x           -5.121e-16  1.238e+00    0.00 1.000000    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 34.36 on 19 degrees of freedom
Multiple R-squared: 2.926e-32,	Adjusted R-squared: -0.05263 
F-statistic: 5.559e-31 on 1 and 19 DF,  p-value: 1
Here we literally have the relationship y = x^2. r^2 is 0 (up to machine rounding error) according to R and it should be exactly 0. All I'm saying is that you can't say that an r or r^2 of 0 means the variables are independent. I have shown in this case that there can be an exact dependence and still we get R^2 = 0.
 

noetsi

Fortran must die
#19

Englund

TS Contributor
#20
I have shown in this case that there can be an exact dependence and still we get R^2 = 0.
But R^2 is calculated on a flawed model, so to speak. The model you're using in this example is a very bad model to explain this relationship. According to this model, the R^2=0 even though we have got a real relationship (well, not u n me, but the variables at hand), just like you said. Try fit a model when you've transformed all x values to absolute values, what is the R^2 then? Then fit a model with x and x^2 as IV's, what is the R^2? After that, fit a model where using only x^2 as IV, what is R^2 now?

My point is that R^2 is, per difinition, a measure of covariation between the IV´s and the DV, as far as I know. If we have a real relationship but an R^2=0, then the model from where we calculate R^2 is a bad model.