collapsing rows of a dataframe and "summing" the labels of the old rows

gianmarco

TS Contributor
#1
Hello.

Let's suppose I have the following dataframe:
Code:
mydata <- structure(list(none = c(4, 4, 25, 18, 10), light = c(2, 3, 10, 
24, 6), medium = c(3, 7, 12, 33, 7), heavy = c(2, 4, 4, 13, 2
), clust = structure(c(1L, 1L, 2L, 3L, 1L), .Label = c("1", "2", 
"3"), class = "factor")), .Names = c("none", "light", "medium", 
"heavy", "clust"), row.names = c("SM", "JM", "SE", "JE", "SC"
), class = "data.frame")

   none light medium heavy clust
SM    4     2      3     2     1
JM    4     3      7     4     1
SE   25    10     12     4     2
JE   18    24     33    13     3
SC   10     6      7     2     1
What I wish to accomplish is (1) to collapse the rows by cluster membership (which is indicated by the last columns to the right) and (2) to have new row labels including the labels of the collapsed rows.

Point (1) can be accomplished by:
Code:
aggregate(. ~ clust, data=mydata, sum)
which returns what follows:
Code:
clust none light medium heavy
1     1   18    11     17     8
2     2   25    10     12     4
3     3   18    24     33    13
I would like to have suggestions about point (2). I would like to get something similar to the above, where instead of (say) 1 I would get the names of the categories belonging to cluster 1 (SM-JM-SC).

Thank you
Best
Gm
 

Dason

Ambassador to the humans
#2
Code:
> nms <- tapply(rownames(mydata), mydata$clust, paste, collapse = "-")
> out <- aggregate(. ~ clust, data=mydata, sum)
> out
  clust none light medium heavy
1     1   18    11     17     8
2     2   25    10     12     4
3     3   18    24     33    13
> rownames(out) <- nms
> out
         clust none light medium heavy
SM-JM-SC     1   18    11     17     8
SE           2   25    10     12     4
JE           3   18    24     33    13