# Identify when a value is observed for the FIRST time in a column

#### ledzep

##### Point Mass at Zero
Dear R users,

I want to know when a certain value occurs for the first time in a column. For argument, lets say , I want to know when the first value of zero occurs for each person in a column.

An Example Data

Code:
test<-data.frame(person=c(rep("A",8), rep("B",8)),day=c(0,1,2,3,7,14,21,32,0,1,2,3,7,14,21,32),response=c(1,0,1,0,0,0,0,0,1,1,1,0,0,1,1,0))
test

> test
person day response
1       A   0        1
2       A   1        0
3       A   2        1
4       A   3        0
5       A   7        0
6       A  14        0
7       A  21        0
8       A  32        0
9       B   0        1
10      B   1        1
11      B   2        1
12      B   3        0
13      B   7        0
14      B  14        1
15      B  21        1
16      B  32        0
Here for Person A, the first zero response was observed on day 1, and for person B, the first was observed on 3.

Hence, the final table will look like:

Code:
> test
person day response first0day
1       A   0        1         1
2       A   1        0         1
3       A   2        1         1
4       A   3        0         1
5       A   7        0         1
6       A  14        0         1
7       A  21        0         1
8       A  32        0         1
9       B   0        1         3
10      B   1        1         3
11      B   2        1         3
12      B   3        0         3
13      B   7        0         3
14      B  14        1         3
15      B  21        1         3
16      B  32        0         3
Many Thanks

#### Dason

Code:
test<-data.frame(person=c(rep("A",8), rep("B",8)),day=c(0,1,2,3,7,14,21,32,0,1,2,3,7,14,21,32),response=c(1,0,1,0,0,0,0,0,1,1,1,0,0,1,1,0))
test

# Needed for ddply
library(plyr)

# Assumes a dataframe is passed in which
# corresponds to all the data for a certain person
# if there is no first 0 day then the result is NA
first0day <- function(x){
x$day[which(x$response == 0)[1]]]
}

tmp <- ddply(test, .(person), first0day)
test <- merge(test, tmp)
colnames(test)[4] <- "first0day"

#### ledzep

##### Point Mass at Zero
Thanks for the code with the comments. I have indeed some cases where there were no zero present, hence NAs were returned as answer.

In case there are no 0 present, I will replace them with the last day the person was seen.

I have cooked up an example for this situation, and tried to create a function to recode the NAs (if present) to the max value of the day for that person.

Code:
test<-data.frame(person=c(rep("A",8), rep("B",8)),day=c(0,1,2,3,7,14,21,32,0,1,2,3,7,14,21,32),response=c(1,1,1,1,1,1,1,1,1,1,1,0,0,1,1,0))
test

# Needed for ddply
library(plyr)

# Assumes a dataframe is passed in which
# corresponds to all the data for a certain person
# if there is no first 0 day then the result is NA
first0day <- function(x){
x$day[which(x$response == 0)[1]]
}

tmp <- ddply(test, .(person), first0day)
test <- merge(test, tmp)
colnames(test)[4] <- "first0day"
test

#================================================#
#Comvert NA's to the last day the person was seen#
#================================================#
foo<- function(x){
if(is.na(x$first0day)) {x$first0day=max(x$day)} } tmp1 <- ddply(test, .(person), foo) test1 <- merge(test, tmp1) colnames(test1)[5] <- "first0day_1" test1 ###I get this output for test1 WITH PERSON B MISSING > test1 person day response first0day first0day_1 1 A 0 1 NA 32 2 A 1 1 NA 32 3 A 2 1 NA 32 4 A 3 1 NA 32 5 A 7 1 NA 32 6 A 14 1 NA 32 7 A 21 1 NA 32 8 A 32 1 NA 32 The code works correctly in that if NA are present, a new column in created containing the last day of the visit. But my Person B is missing. Any thoughts on what wrong I did? #### Dason ##### Ambassador to the humans This puts the last day the person was seen in as a filler if there was no first 0 day just by modifying the function used Code: test<-data.frame(person=c(rep("A",8), rep("B",8), "C", "C"),day=c(0,1,2,3,7,14,21,32,0,1,2,3,7,14,21,32, 72, 87),response=c(1,0,1,0,0,0,0,0,1,1,1,0,0,1,1,0,1, 1)) test # Needed for ddply library(plyr) # Assumes a dataframe is passed in which # corresponds to all the data for a certain person # if there is no first 0 day then the result is NA first0day <- function(x){ ans <- x$day[which(x$response == 0)[1]] if(is.na(ans)){ ans <- max(x$day)
}
return(ans)
}

tmp <- ddply(test, .(person), first0day)
test <- merge(test, tmp)
colnames(test)[4] <- "first0day"
Not sure if that's what you wanted and I didn't look at your code yet because I need to go catch a bus.

#### ledzep

##### Point Mass at Zero
This is precisely what I wanted to do. Million Thanks Dason.