Unique Counts and Averages over Subgroups

smithosaurus

New Member
Hi all,

This is kinda similar to the question I had last week, but I think its a little more complicated.

Basically I have two variables, and I want to perform actions in relation to unique combinations of these variables.

DATE TIME PLAYER SCORE

2012/8/1 Evening BOB 2
2012/8/1 Evening TOM 3
2012/8/1 Evening BOB 1
2012/8/1 Evening BOB 2
2012/8/1 Night BOB 1
2012/8/1 Night BOB 1
2012/8/2 Evening TOM 1
2012/8/2 Evening TOM 3

My goal is summarize this data based on the two factors, DATE and TIME.
I want to count the unique players for each unique factor combination, and also have the average value for the scores. I tried using tapply and rle but with no luck. Here is what the desired result would look like;

DATE TIME UNIQUE PLAYERS AVERAGE SCORE

2012/8/1 Evening 2 2
2012/8/1 Night 1 1
2012/8/2 Evening 1 2

I've been at it for over 5 hours and I'm not much closer to a solution than when I started. Any help would be greatly appreciated, thank you.

"Code for Example"
Code:
DATE=c(rep("2012/8/1",6),rep("2012/8/2",2))
TIME=c(rep("Evening",4),rep("Night",2),rep("Evening",2))
PLAYER=c("BOB","TOM",rep("BOB",4),rep("TOM",2))
SCORE=c(2,3,1,2,1,1,1,3)
dat=data.frame(DATE,TIME,PLAYER,SCORE)

trinker

ggplot2orBust
Aggregate is a pretty nice function if you're starting out:

Code:
aggregate(SCORE~DATE+TIME, data = dat, mean)
Also the plyr package is pretty nice too.

Dason

Ambassador to the humans
There are other base functions that can do this. But by far I think the easiest way to do this is with the ddply function from the plyr package.

Code:
DATE=c(rep("2012/8/1",6),rep("2012/8/2",2))
TIME=c(rep("Evening",4),rep("Night",2),rep("Evening",2))
PLAYER=c("BOB","TOM",rep("BOB",4),rep("TOM",2))
SCORE=c(2,3,1,2,1,1,1,3)
dat=data.frame(DATE,TIME,PLAYER,SCORE)

library(plyr)
ddply(dat, .(DATE, TIME), summarize,
nplayer = length(unique(PLAYER)),
avescore = mean(SCORE))
The general syntax for ddply is ddply(data, .(idvar1, idvar2, however_many_id_vars_you_want), function, additional_parameters_to_function)

summarize is a nice function that will create summaries for you so in our example we're summarize the subsets based on DATE and TIME by counting up the number of unique players and by looking at the average score.

trinker

ggplot2orBust
I miss read the question. Dason's right plyr is probably the way to go on this one.

smithosaurus

New Member
Flawless. Thank you for everything Dason! You've helped me out so much