Principal Component Analysis in R Help

inferno

New Member
I am a beginner to R. I have read several guides, but still am stuck on this:

I have data in an excel csv file, on which I want to run PCA.
I'm not sure how the prcomp formula works. The help page states:
prcomp(x, retx = TRUE, center = TRUE, scale. = FALSE,
tol = NULL, ...)

what is x referring to? I tried putting the file name for x, but i get the following error:
Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric

what kind of numeric value do I need to put in for x?

Potentially helpful information: my data sheet has around 48 columns and over 7000 rows. I have converted the csv file into a matrix in R.

bugman

Super Moderator
I am a beginner to R. I have read several guides, but still am stuck on this:

I have data in an excel csv file, on which I want to run PCA.
I'm not sure how the prcomp formula works. The help page states:
prcomp(x, retx = TRUE, center = TRUE, scale. = FALSE,
tol = NULL, ...)

what is x referring to? I tried putting the file name for x, but i get the following error:
Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric

what kind of numeric value do I need to put in for x?

Potentially helpful information: my data sheet has around 48 columns and over 7000 rows. I have converted the csv file into a matrix in R.

x is the name of your dataframe or matrix (i.e. the name you have given to your file.

inferno

New Member
I put the name of my matrix, but i got the error:
Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric

Not sure what to do with this.

bugman

Super Moderator

and then post the output

Lazar

Phineas Packard
This would suggest that not all of the variables in your matrix are numeric. You most likely only want to provide a subset of variables (I guess). So you will need to pass the principal components function a subsetted dataset. For example say you have a dataset called myData and you want to do PCA on variables 3 to 100 you can use
Code:
prcomp(myData[,3:100])
With subsetting data.frames and matrices row selection goes on the left side of a comma in the square brackets and columns on the right i.e., myData[rows,columns] so myData[,3:100] is saying take all rows but only columns 3 to 100.

inferno

New Member
All variables are numeric- I have checked.
My data sheet does have row names, however. Is it possible that it is reading them as data? If so, how can I go about this?

Lazar

Phineas Packard
OK. Pretty sure you have not read your data into R correctly. Can you provide your whole script given that:
Code:
>head(inferno)
[,1]
[1,] "Genes.csv"
Is most certainly not what you want. My guess is that you did:
Code:
inferno <- "Genes.csv
NOPE

You want:
Code:
inferno <- read.csv("Genes.csv")

inferno

New Member
Thanks! that seemed to be my error. However, the prcomp function still yields an error stating that "x" needs to be numeric.

After inputting what you suggested, this is part of the output (it was very long)

Gene.Name X0.min X2.min X3.min
1 78SDA 0 0.07768191 0.3793334
2 SDFK 0 0.77090604 1.7159830
3 SF56 0 0.00000000 0.0000000
4 89SFA 0 0.00000000 0.0000000
5 AFJK2 0 0.00000000 0.0000000
6 SUP23 0 0.00000000 0.0000000

Lazar

Phineas Packard
Well not all of the variables are numeric above. 78SDA is not numeric it is a character string. See my first post on how to subset only the variables you need

inferno

New Member
This is what i get when I omit the first column (which contains the gene names):

> prcomp(infernos[,2:49])
Error in infernos[, 2:49] : subscript out of bounds

inferno

New Member
I have 49 columns of reads under various time points for each gene.
Thank you for the recommendation.

Lazar

Phineas Packard
when you type in

Code:
str(inferno)#str means show me the structure of an object
and

Code:
dim(inferno)#dim means give me the dimensions of an object. Rows first then columns

inferno

New Member
The dimensions of my data, according to dim(inferno), are 7000 by 48.
I deleted the gene name column, so that my data would not contain any characters besides the header, which i set as TRUE.

I am still getting the error: Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric
when I run prcomp :/

inferno

New Member
str(inferno) output:

data.frame 7000 obs. of 48 variables.

I really appreciate your time in helping me, btw!

Lazar

Phineas Packard
Surely that is not all the output? For example:
Code:
> str(iris)
'data.frame':	150 obs. of  5 variables:
$Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...$ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...$ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ... See after$Sepal.Length: there is a value 'num'. That means that variable is numeric.

BUT for $Species: is says 'Factor' so this variable is not numeric. inferno New Member > str(inferno) 'data.frame': 7000 obs. of 48 variables:$ X0.min : int 0 0 0 0 0 0 0 0 0 0 ...
$X2.min : num 0.0777 0.7709 0 0 0 ...$ X3.min : num 0.379 1.716 0 0 0 ...
$X4.min : num 0 1.79 0 0 0 ... The rest of the variables are all "num". Only the first one is "int". Lazar Phineas Packard well there is your answer inferno$X0.min <- as.numeric(inferno\$X0.min)

would solve the problem. Or just subset it out