Generating random data from correlations

Say I have a list of personality traits: courage, materialism, compassion and so on. I interview some people and assign a rating from -10 to +10 to each trait for each person, then I find the correlation between each pair of traits.

Now I want to create a "random person" by assigning a value to each trait, consistent with the correlations I have found.

Is there is a good way to do this?

Also: suppose traits T1 and T2 have correlation C. Given a fixed value X for T1, what does the distribution of T2 look like? Or to put it another way, given that trait T1's value = X, what is the probability of trait T2 having some value Y?


TS Contributor
I'll give it a try (hopefully, someone else wants to comment on this):

T2 = c + beta*T1 + error .

According to regression assumptions, error should be normally distributed with mean = 0 and variance = 1 - R². You could create a normally distributed random variable with variance 1-R² (i.e. 1-beta²) and use it as your error term.

Or, you calculate some confidence interval for each subject's estimated T2 value, based on the error variance/standard deviation.

How much all this is affected by the fact that one's errors are hardly ever normally distributed, and often more or less heteroscedascic, I am not sure.

With kind regards



TS Contributor
If you know some theory behind which support you to choose a certain kind of parametric model, or some semi-parametric model like what Karabiner suggested, you can always do so - but you need to check whether such model give a good fit to a data. This kind of model may give you a more elegant result, a nicer interpretation, a more powerful prediction, and maybe easier to develop further theory based on such result. But such elegant model may not even exist, and may have mis-specification error.

So back to your question - if you have a parametric model, you can of course find out the multivariate distribution by estimating the parameters, and thus also given an estimated conditional distribution. After estimating all the necessary parameters, one can simulate a multivariate random vector according to it. Whether you have a easy/good method is another issue; e.g. for multivariate normal you may use Cholesky decomposition to help, but such elegant method may not exist for other kind of multivariate distribution.

For non-parametric alternative, actually you are having a discrete multivariate random vector where there are a total of \( 11^k \) support points, where \( k \) is the number of variables, i.e. the dimension of the vector. With sufficient number of data, you can always give a good estimation of the empirical multivariate probability mass function, and thus giving everything you want. The difficulty is that you may not have such large amount of data, esp when \( k \) is large, and you may need to fall back to parametric approach to lessen this requirement.