# Is it logical to perform PCA and reduction on observations instead of features?

#### stephenwright

##### New Member
I am currently working with a set of code called GPMSA that was published by LANL. The code serves to create a Gaussian process model of some simulator and perform regression with experimental data. I am working to understand everything but there is something that is confusing me. PCA is performed on the outputs of the simulation to reduce the dimensions of it but instead of reducing the features they reduce the number of observations of the data. The matlab code for this is shown below. The data is read in with the normal structure of observations in the rows and features in the columns but it is then transposed before performing the PCA. I've checked that the matlab SVD function is intended to work with observations in rows and features in columns.

fname=('simobs');
ysim=ysim';
% normalize
ysimmean=mean(ysim,2);
ysimStd=ysim-repmat(ysimmean,1,m);
ysimsd=std(ysimStd,0,2);
ysimStd=ysimStd./repmat(ysimsd,1,m);
% SVD and reduction
[U,S,V]=svd(ysimStd,0);
lam=diag(S).^2/sum(diag(S).^2);
lam=cumsum(lam);
pu=[];
pu=sum(lam<0.99999)+1;
Ksim=U( :,1: pu)*S(1: pu,1: pu)./sqrt(m);

There is an accompanying paper for this code (DOI 10.1198/016214507000000888 ) that discusses the computational expense of having a large number of output values being the reason for using PCA to reduce the outputs but I don't understand why the reduction would be on the observations and not the number of features. Does this make sense to anyone or can anybody point me in the right direction?