# What would you do?

#### Sam Bernard

##### New Member
Hello everybody,

I hope this is the right place.

I am writing my thesis on Sport industry and I'm currently facing a statistical issue (of course, you'd say, we are in a statistics forum ).

I have to demonstrate the independence between revenues and results on the pitch (represented by a numeric coefficient).
I have the data from 2012 to 2015, 4 years, for 33 teams. My hypothesis is that year after year the revenues are less and less dependent from the results on the pitch.

How would you do that? Considering also the small size of the sample.

Thank you very much to anyone that could help me!

#### axeler

##### New Member
If I understood your problem, I think that the simplest way to estimate the independence of two variables (in this case revenues and results, if both are numerical) is to calculate their correlation index (CI). It is an index of "how much one variable increases when the other increases": -1<=CI<=1 and
- if CI is nearly 0 there's almost perfect independence between the variables
- if CI is nearly 1 there's almost perfect correlation ("when the first increases also the second increases")
- if CI is nearly -1 there's inverse correlation ("when the first increases, the second decreases")
You can split your dataset by year and calculate a CI for each year, considering if it decreases during the years. It should work well even if you don't have many samples.
Hope it will help!

#### Sam Bernard

##### New Member
If I understood your problem, I think that the simplest way to estimate the independence of two variables (in this case revenues and results, if both are numerical) is to calculate their correlation index (CI). It is an index of "how much one variable increases when the other increases": -1<=CI<=1 and
- if CI is nearly 0 there's almost perfect independence between the variables
- if CI is nearly 1 there's almost perfect correlation ("when the first increases also the second increases")
- if CI is nearly -1 there's inverse correlation ("when the first increases, the second decreases")
You can split your dataset by year and calculate a CI for each year, considering if it decreases during the years. It should work well even if you don't have many samples.
Hope it will help!
Thanks a lot, yes it looks like the simplest method. I don't want to take advantage of your kindness but maybe you know as well how I can do it with SPSS?

#### axeler

##### New Member
I'm sorry, I don't use SPSS, so I don't know how to do it. I know that Matlab and R have the function "cor", maybe SPSS has something similar. In any case, if you have only two variables (x and y), you can calculate the correlation index also manually:
- First you have to calculate the covariance between variables:
Sxy=(sumi(xi*yi)-n*xmean*ymean)/(n-1)​
where xi and yi are the values of the variables, sumi stands for the sum of the products over all the observations, xmean and ymean are the variable means and n is the number of observation
- Then you can calculate the CI (its real symbol in general is the greek letter rho):
CIxy=Sxy/(Sx*Sy)​
where Sx and Sy are the variable variances