sapply function in R help

#1
I have read two .csv files and did some editing.
>a1<-read.csv("2013.csv",header=T, na.strings = c("NULL","PrivacySuppressed"))

>a2<-a1[,441,drop=F]

>a3<-a1[,-441,drop=F]

>a4<-cbind(a1,a2)

>a4<-a4[, colSums(is.na(a4)) != nrow(a4)]

> mode(a4)

>[1] "list"

I need the a4 to be an integer so I used sapply
> s<-sapply(a4, as.numeric)

> mode(s)
>
[1] "numeric"

However, the problem is, the column names disappeared.
> names(s)
>
NULL

All the previous datas had column names. Sorry it is impossible to type here since there are 600 variables (600 different column names). I had names for my column until a4. After apply "sapply", the names says "NULL". When I just input s, I see the names of the columns but it is not detecting them as names for columns. Please help. Thank you.
 
#2
The sapply command is returning a matrix so the column names are stored as an attribute of the matrix (dimnames). Wrap it in data.frame() to preserve the original structure.

Code:
s <- data.frame(sapply(a4, as.numeric))
 

bryangoodrich

Probably A Mammal
#3
We don't need to see your real data. Create a small reproducible sample of data and show it here. You can provide us this data using the dput function as shown below.

Code:
x <- data.frame(A = sample(1:100, 10), B = sample(letters, 10))
dput(x)
# structure(list(A = c(97L, 54L, 11L, 28L, 58L, 80L, 61L, 59L, 
# 30L, 41L), B = structure(c(7L, 2L, 8L, 1L, 5L, 4L, 9L, 3L, 10L, 
# 6L), .Label = c("b", "c", "e", "g", "m", "o", "r", "t", "v", 
# "z"), class = "factor")), .Names = c("A", "B"), row.names = c(NA, 
# -10L), class = "data.frame")
Now anyone can just assign that structure directly to an object

Code:
y <- structure(list(A = c(97L, 54L, 11L, 28L, 58L, 80L, 61L, 59L, 
30L, 41L), B = structure(c(7L, 2L, 8L, 1L, 5L, 4L, 9L, 3L, 10L, 
6L), .Label = c("b", "c", "e", "g", "m", "o", "r", "t", "v", 
"z"), class = "factor")), .Names = c("A", "B"), row.names = c(NA, 
-10L), class = "data.frame")
identical(x, y)
# [1] TRUE
Also, when presenting code use the code tags like I've done above

[noparse]
Code:
 ... some code here ...
[/noparse]

produces

Code:
 ... some code here ...
Honestly, I have no idea what you're trying to do with each of those steps and you should document that in your code with comments. I'm unclear what you are expecting your output to be in each instance. Your a2 is just a data.frame of one of your columns from a1, to which a4 is you putting that column back together with a1 so it appears twice. Your a3 isn't used, so I don't know why you're showing it in this example to us. It's superfluous. Then your test (after some thinking on my part) is basically summing the boolean values of is.na for each column in a4 and comparing with the dimension of a4. If you're going to do boolean tests, you would be better off directly using any or all

Code:
any(c(TRUE, FALSE, FALSE))  # TRUE
any(c(FALSE, FALSE))  # FALSE
all(c(TRUE, FALSE, TRUE))  # FALSE
all(c(TRUE, TRUE))  # TRUE
I suggest giving an example that reproduces your problem because the result of your test should still be a data frame (data frames are still list objects underneath the hood). Doing sapply on a data frame doesn't make much sense unless your result is something less than a data frame. What sapply does is try to simplify your result, but it figures this out on its own. Better to be in control of the result yourself. A technique I only learned last year is this

Code:
x[] <- lapply(x, some.function)
Here lapply will operate on each column of x applying some.function to them. The result will be a list, but since I'm assigning it not to x but x[], it will fit it back into the container that x is (dimensions and such). Thus, when doing simple transformation, this programming pattern works well.