Random Variables and Correlation

#1
I claim that if x is a normal distributed random variable, and y is a uniformly distributed random variable then x and y cannot have a correlation of 1 or nearly 1. Do you agree with my claim?
 
#3
Thanks for the response. Given a normally distributed random variable x with mean 0 and standard deviation 1, can you tell me how to generate a random variable y such that y is uniformly distributed between 30 and 32 and the correlation between x and y is 1?

If the algorithm is detailed, maybe you can point me to the right book or paper?


Thanks

Bob
 

Dragan

Super Moderator
#4
Thanks for the response. Given a normally distributed random variable x with mean 0 and standard deviation 1, can you tell me how to generate a random variable y such that y is uniformly distributed between 30 and 32 and the correlation between x and y is 1?

If the algorithm is detailed, maybe you can point me to the right book or paper?


Thanks

Bob
You can't do it - even if the correlation was r<|1| - given the way you described the scenario.

In short, what you're ostensibly implying is that y is a linear function of a standard normal random deviate (x). Thus, y would also have to be normally distributed i.e.

y=r*x + Sqrt[1-r^2]*e

where x and e are both unit normal. If r=1, then it becomes

y=r*x.


Allow me to elaborate some more...

Suppose x and e were both (standard) uniformly distributed. Then the correlation (between y and x) would still be r, but the distribution of y would not be uniform because of the central limit theorem..unless of course, r=1...which is trivial.

Further, even if x was unit normal and e was unit uniform , y would neither be normal or uniform for 0<r<|1|.


Note: After further review, there is a way you can do this. However, you would have to approximate the normal distribution. The way that comes to mind is using the Generalized Lambda Distribution...I can explain it if you would like.
 
Last edited:
#5
No, because I can take identical values from the two distributions. Their correlation is a priori 1.
It is important to note that I did not imply that I was taking random samples from the two distributions. What I am saying is that if I can purposely select sprcific values from each of the two distributions, then I can construct a countable infinity of vectors which are identical. The probability of randomly selecting identical vectors is very small indeed, but in mathematics a single counter example destroys a hypothesis.
 

Dragan

Super Moderator
#6
....Generalized Lambda Distribution...I can explain..

In an easy way, I can get very close to what you're asking: Here's the data generation procedure:

1. Generate U~(0,1).

2. X=2*U + 30, so X~U(30,32).

3. Y=(U^0.13491245 - (1 - U)^0.13491245) / 0.19745137.

Y is a Generalized Lambda Distribution and will be very close to a standard normal distribution.

And, the correlation between X and Y will be very close to 1 ~ 0.98.
 
#7
Follow-up Question

Dragan,

Thanks for the response. The problem I am really trying to solve looks like this. I have n (where n is about 10 ) random variables. All but one of them is normally distributed. In addition, they are all correlated and for each pair variables I have, I have the correlation. Now, assuming that they are all normally distributed, I know how to generate instances of these random variables. However, I do not know how to deal with the one variable that is not normally distributed but uniformly distributed. My plan was to generate it as a normally distributed number and then try to map it to a uniformly distributed number. I am thinking that this still might work if I were to run your algorithm backwards. That is, I need an algorithm that given a normal distributed number, would generate a uniformed distributed number with a correlation factor of about 1. Your above algorithm takes a uniformly distributed number and then generates a normally distributed number with correlation factor of about 1.

Maybe you could give me an algorithm that takes a normally distributed number and generates a uniformly distributed number with correlation approaching 1. I think this would help me.

I would like your thoughts.
 
Last edited:

Dragan

Super Moderator
#8
Dragan,

Thanks for the response. The problem I am really trying to solve looks like this. I have n (where n is about 10 ) random variables. All but one of them is normally distributed. In addition, they are all correlated and for each pair variables I have, I have the correlation. Now, assuming that they are all normally distributed, I know how to generate instances of these random variables. However, I do not know how to deal with the one variable that is not normally distributed but uniformly distributed. My plan was to generate it as a normally distributed number and then try to map it to a uniformly distributed number. I am thinking that this still might work if I were to run your algorithm backwards. That is, I need an algorithm that given a normal distributed number, would generate a uniformed distributed number with a correlation factor of about 1. Your above algorithm takes a uniformly distributed number and then generates a normally distributed number with correlation factor of about 1.

Maybe you could give me an algorithm that takes a normally distributed number and generates a uniformly distributed number with correlation approaching 1. I think this would help me.

I would like your thoughts.

Okay, I think I can solve your problem.

What you would do is start with 10 standard normal variables with an arbitray correlation matrix (that is sufficiently postive definite - I suspect you realize this). For simplicity, let's say you want all of the pairwise correlations to be 0.5.

But, the 10-th variable you want to be uniformly distributed. What you do is use the standard normal CDF and transform the 10-th variable from Z (standard normal) to U (these deviates will be uniform on the interval from 0 to 1).

Now, when you do this transformation from Z to U it will change (reduce) the correlation slightly.... 0.50->0.488..

All you need to do is change the initial correlation from 0.50 to 0.511663 for the standard normal variables (Z1,Z2,...,Z9) that are correlated with the 10-th (Uniform) variable. And, that will control for the transformation from Z->U such that all variables have the specified correlations of 0.50 and the specified distributions.

You can get the value of 0.511663 through numerical integration...In this case it's 0.5*Sqrt[Pi/3]=0.511663... - I can explain more on this if this idea works for you.
 
Last edited:
#9
Dagan,

Thanks for the response. I believe that your approach is going to solve my problem. Please tell me more including where those constants come form.

Thanks
Bob
 

Dragan

Super Moderator
#10
Dagan,

Thanks for the response. I believe that your approach is going to solve my problem. Please tell me more including where those constants come form.

Thanks
Bob
Okay, (I also have a closed formed solution that will help).

For simplicity, let's reduce the number of variables to 3.

Let's specify a correlation matrix for 3 standard normal variables:

Corr[Z1,Z2]=0.6;
Corr[Z1,Z3]=0.7;
Corr[Z2,Z3]=0.9.

It is easy to generate Z1, Z2, and Z3 with the specified correlations - this we both know.

Next, we want Z3 to be uniformly distributed (U3). So, we tranform Z3 as:

U3=Phi(Z3) where U3~U(0,1), where Phi is the standard normal CDF. Note: Many statistical packages will do this transformation (Minitab, SAS, Mathematica, etc, etc). If you don't have one I have some suggestions on how to create your own algorithm.

Now, what is the effect of this transformation on the correlation matrix above? The effect is a reduction in the correlation by a constant of Sqrt[3/Pi]=0.9772... for each variable that was correlated with Z3. That is, the correlation structure is now:

Corr[Z1,Z2]=0.6
Corr[Z1,U3]=0.7*Sqrt[3/Pi]
Corr[Z2,U3]=0.9*Sqrt[3/Pi].

Therefore, you start with a correlation matrix of Z1, Z2, and Z3 of

Corr[Z1,Z2]=0.6;
Corr[Z1,Z3]=0.7*Sqrt[Pi/3];
Corr[Z2,Z3]=0.9*Sqrt[Pi/3]:.

Then, once you do the transformation on Z3 you will have:
Corr[Z1,Z2]=0.6
Corr[Z1,U3]=0.7
Corr[Z2,U3]=0.9.

You can then tranform U3 to X3~U(30,32) and that will have no effect on the correlation structure.

Note: Your upper bound for a correlation between Z1 (or Z2) and Z3 is 1.0 which would give an upper limit of a correlation between Z1 (or Z2) and U3 = Sqrt[3/Pi] after the transformation of Z3.
 
Last edited:
#11
Dragan,

Thanks for the response. I am going to be implementing your suggestions in C++ as part of the simulation program I am writing. I am wondering, if you can point me to any websites, or books, that give a more detailed explanation on why your solution works. I would like to know where the
sqrt(PI/3) comes from.

Bob
 

Dragan

Super Moderator
#12
Dragan,

Thanks for the response. I am going to be implementing your suggestions in C++ as part of the simulation program I am writing. I am wondering, if you can point me to any websites, or books, that give a more detailed explanation on why your solution works. I would like to know where the
sqrt(PI/3) comes from.

Bob

Bob: I don't have a source "off the top of my head" and I don't think you really need one because I can just show you where Sqrt[3/Pi] (or Sqrt[Pi/3]) comes from.

Here goes:

Let:
z1 and z2 be standard bivariate normal.

Phi(z2)=Integrate[(1/Sqrt[2*Pi]) * Exp[-u2^2 / 2] du2.

x2=Sqrt[3]*(2*Phi(z2) - 1).

f12 = The Standard Normal Bivariate Density Function -- where Rho_z1z2 is the bivariate correlation between z1 and z2.

Now, form the Double Integral and impose the assumption: -1<Rho_z1z2< +1.

Thus, the correlation between z1 and x2 (Rho_z1x2) is:

Integrate[ (z1*x2) *f12, {z1, -Infinity, Infinity}, {z2, -Infinity, Infinity}] =

= Rho_z1x2 = Sqrt[3/Pi]*Rho_z1z2...a closed form solution:)


Therefore, all we need to do is start with a correlation of Sqrt[Pi/3]*Rho_z1z2

and we get the desired result...that is....the desired correlation between z1 and x2 (which is uniform).....easily done.