# Include supplementary individuals PCA with PROC PRINCOMP

#### CedricLVQ

##### New Member
Hello,

I am using PCA with PROC PRINCOMP to perform multivariate statistical process control and would like to build a PCA only on a certain set of individuals and let the others as supplementary.

Anyone knows how to do it?

I know this functionality exists with other software such as R.

Cédric.

#### ondansetron

##### TS Contributor
What do you mean as the others are "supplementary"?

#### CedricLVQ

##### New Member
What do you mean as the others are "supplementary"?
Supplementary means that they are not taken into account in the correlation matrix used for decomposition of eigen vectors.

Then they are plotted and scored using these eigen vectors.

Here is an example with R code:

res.pca <- PCA(decathlon2, ind.sup = 24:27,
quanti.sup = 11:12, quali.sup = 13, graph=FALSE)

In this code, individuals (row 24 to 27) are not taken into account.

#### ondansetron

##### TS Contributor
I guess I just wanted to clarify: are you using supplementary really as a hold out sample (i.e. not for model estimation)?

#### CedricLVQ

##### New Member
I guess I just wanted to clarify: are you using supplementary really as a hold out sample (i.e. not for model estimation)?
Yes that's it. The individuals used for model estimation will be the "reference period" and determine the eigenvectors of the PCA model. Then the ellipse on individuals (i.e 95% or 99.73% as in control charts approach) will the referential for outliers detection for the next individuals that are going to be scored and plotted (so without contributing to the determine eigenvectors).

#### hlsmith

##### Less is more. Stay pure. Stay poor.
I don't know how @ondansetron extrapolated that from your post. Why don't you just create a deterministic data split conditional on pulling out certain rows or more likely in SAS, OBS. You would sort data based criteria to create repeatability, then add a count variable, then perform a data step, conditioning on the count variable to get rows selected or based on an existing identifier, something like:

data new_set;
set old_set;
where Obs GE 24 and Obs LE 27;
run;

There is probably some better way to create both sets at the same time, I can almost see the code, where you would use OBs to create a marker then split data based on the marker.