Identify when a value is observed for the FIRST time in a column

ledzep

Point Mass at Zero
#1
Dear R users,

I want to know when a certain value occurs for the first time in a column. For argument, lets say , I want to know when the first value of zero occurs for each person in a column.

An Example Data

Code:
test<-data.frame(person=c(rep("A",8), rep("B",8)),day=c(0,1,2,3,7,14,21,32,0,1,2,3,7,14,21,32),response=c(1,0,1,0,0,0,0,0,1,1,1,0,0,1,1,0))
test

> test
   person day response
1       A   0        1
2       A   1        0
3       A   2        1
4       A   3        0
5       A   7        0
6       A  14        0
7       A  21        0
8       A  32        0
9       B   0        1
10      B   1        1
11      B   2        1
12      B   3        0
13      B   7        0
14      B  14        1
15      B  21        1
16      B  32        0
Here for Person A, the first zero response was observed on day 1, and for person B, the first was observed on 3.

Hence, the final table will look like:

Code:
> test
   person day response first0day
1       A   0        1         1
2       A   1        0         1
3       A   2        1         1
4       A   3        0         1
5       A   7        0         1
6       A  14        0         1
7       A  21        0         1
8       A  32        0         1
9       B   0        1         3
10      B   1        1         3
11      B   2        1         3
12      B   3        0         3
13      B   7        0         3
14      B  14        1         3
15      B  21        1         3
16      B  32        0         3
Many Thanks
 

Dason

Ambassador to the humans
#2
Code:
test<-data.frame(person=c(rep("A",8), rep("B",8)),day=c(0,1,2,3,7,14,21,32,0,1,2,3,7,14,21,32),response=c(1,0,1,0,0,0,0,0,1,1,1,0,0,1,1,0))
test

# Needed for ddply
library(plyr)

# Assumes a dataframe is passed in which
# corresponds to all the data for a certain person
# if there is no first 0 day then the result is NA
first0day <- function(x){
	x$day[which(x$response == 0)[1]]]
}

tmp <- ddply(test, .(person), first0day)
test <- merge(test, tmp)
colnames(test)[4] <- "first0day"
 

ledzep

Point Mass at Zero
#3
Thanks for the code with the comments. I have indeed some cases where there were no zero present, hence NAs were returned as answer.

In case there are no 0 present, I will replace them with the last day the person was seen.

I have cooked up an example for this situation, and tried to create a function to recode the NAs (if present) to the max value of the day for that person.

Code:
test<-data.frame(person=c(rep("A",8), rep("B",8)),day=c(0,1,2,3,7,14,21,32,0,1,2,3,7,14,21,32),response=c(1,1,1,1,1,1,1,1,1,1,1,0,0,1,1,0))
test

# Needed for ddply
library(plyr)

# Assumes a dataframe is passed in which
# corresponds to all the data for a certain person
# if there is no first 0 day then the result is NA
first0day <- function(x){
	x$day[which(x$response == 0)[1]]
}

tmp <- ddply(test, .(person), first0day)
test <- merge(test, tmp)
colnames(test)[4] <- "first0day"
test

#================================================#
#Comvert NA's to the last day the person was seen#
#================================================#
foo<- function(x){
	if(is.na(x$first0day)) {x$first0day=max(x$day)} 
}

tmp1 <- ddply(test, .(person), foo)
test1 <- merge(test, tmp1)
colnames(test1)[5] <- "first0day_1"
test1

###I get this output for test1 WITH PERSON B MISSING

> test1
  person day response first0day first0day_1
1      A   0        1        NA          32
2      A   1        1        NA          32
3      A   2        1        NA          32
4      A   3        1        NA          32
5      A   7        1        NA          32
6      A  14        1        NA          32
7      A  21        1        NA          32
8      A  32        1        NA          32
The code works correctly in that if NA are present, a new column in created containing the last day of the visit. But my Person B is missing.

Any thoughts on what wrong I did?
 

Dason

Ambassador to the humans
#4
This puts the last day the person was seen in as a filler if there was no first 0 day just by modifying the function used
Code:
test<-data.frame(person=c(rep("A",8), rep("B",8), "C", "C"),day=c(0,1,2,3,7,14,21,32,0,1,2,3,7,14,21,32, 72, 87),response=c(1,0,1,0,0,0,0,0,1,1,1,0,0,1,1,0,1, 1))
test

# Needed for ddply
library(plyr)

# Assumes a dataframe is passed in which
# corresponds to all the data for a certain person
# if there is no first 0 day then the result is NA
first0day <- function(x){
	ans <- x$day[which(x$response == 0)[1]]
	if(is.na(ans)){
		ans <- max(x$day)
	}
	return(ans)
}

tmp <- ddply(test, .(person), first0day)
test <- merge(test, tmp)
colnames(test)[4] <- "first0day"
Not sure if that's what you wanted and I didn't look at your code yet because I need to go catch a bus.