Using a Correlation Matrix with no data in SPSS

I have a correlation matrix with the correlations between each set of five independent variables and the correlation between each IV and the DV. I would like to use this information to determine how R Squared changes for the model when I use the pair of IVs with the greatest correlation to the DV and then how much explanatory power is added with adding each subsequent IV until all 5 are used.

I do not have a data set. Is this possible using SPSS-22? If so, how do I enter the correlations into the data and what syntax should I use?


Less is more. Stay pure. Stay poor.
Hmm interesting question. Seems like it could fall under Structural Equation Modeling (direct and indirect effects). I am not experience with SEMs, so I can't help you there.

Though, you might be able to also examine this using simulated data. I used SAS to simulated data from a correlation matrix (independent variables), but not also with a dependent variable at the same time. So, exploration in that area may provide more information.

Exactly how much information do you have (e.g., sample size, effects, etc.)?
Unfortunately, I have only the correlations between variables and nothing else. Similar to this (ignore the colinearity, that is not a problem with the current matrix):

An Example
From Longley, J. W. (1967). An appra
isal of least squares programs for the
electronic computer from the point of view of the user.
Journal of the American Statistical
Association, 62, 819-841.
The Longley data set is often used as an example of a data set that manifests severe multicollinearity among the independent variables.
Variables are:
X1 = GNP Deflator
X2 = GNP (Gross National Product)
X3 = Unemployed
X4 = Armed Forces
X5 = U.S. Population (in millions)
X6 = Year (1947
Y = Employed (in millions)

the dependent variable Correlations (in lower diagonal form):
X1 X2 X3 X4 X5 X6 Y
X1 1.000
X2 .992 1.000
X3 .621 .604 1.000
X4 .465 .446 .177 1.000
X5 .979 .991 .687 .364 1.000
X6 .991 .995 .668 .417 .994 1.000
Y .971 .984 .502 .457 .960 .971 1.000
Determinant of correlation matrix (among Xs: X1-X6) = 0.0000000016
If I wanted to know what the correlation between 2 IVs and the DV was, what syntax would I write?
If I wanted to add additional IVs to the model, how would I write that syntax?


Less is more. Stay pure. Stay poor.
I am not looking up the paper, but do they provide the formatting for variables (e.g. X1 is continuous, taking values between ? - ?, with following parametrics). For example, was 'unemployment' categorical. Also what type of correlations are these (e.g., Pearson, Spearman, Biserial, etc.)?

Also, thes are bivariate correlations, where the statistic did not control for the other variables, guessing so?


No cake for spunky
I am not sure of the theory, but in practice you can run structural equation models with a covariance/correlation structure only. So you could run the different models and compare model results which will show the change in R squared. However, when I did SEM they used measures other than R squared to determine which model was the best (there are many).

Many other methods, at least in software I have seen, won't do this. You have to have the raw data.
These are simple percentage values with Pearson correlations. These are the bivariate correlations and I am seeking multivariate correlations. If we know the correlations between X1 and Y, X2 and Y, and X1 and X2, can I write syntax that will give the combined correlation between X1, X2 and Y?


Less is more. Stay pure. Stay poor.
I am just following along to see what gets posted. I don't know the intricacies of SEM. However, I would imagine you can get that information, it would obviously not be exact since you are not working with actual data.
It is possible to estimate a multiple regression model from the correlation matrix.

Maybe you also need to insert some variances, but you can insert a unit value and think of the estimates as scaled values in units of the standard deviation. You can also insert means but that will only affect the level, like the intercept.

And also: it is possible to enter covariances/ correlations matrixes in spss and estimate regression relations from that.

(Sometimes it is also a good trick: if you have millions of data points, 1) aggregate all the data in one correlations matrix and 2) estimate from the correlations matrix and try various different models.)


No cake for spunky
I never realized you could estimate regression from a correlation matrix. That is something I need to find if SAS does as well.
I am a novice and need more direction. I have entered a mean of 0 and a standard deviation of 1 assuming this would make the data workable. When I run regressions using any combination of 2 IVs, my R square value for the model is 1.0. What am I doing wrong? What test would you run?
(You don't have to answer if you want to be alone.)
What am I doing wrong?

The Longley dataset is extremely co-linear. This what it is famous for. I guess you can easily find the raw data on internet somewhere. Search. Look at the simple correlations. They are high.

It would be better to practice on some other data set that is not as extreme.
Thank you Greta,

I have a different correlational matrix I am working with. I just had access to the Langley matrix to post as an exam. The matrix I am using does not have the same problem with colinearity.