# Vectorized comparison of column values

#### ferrand

##### New Member
Hi all,

I have a large dataset in which each observation contains a time stamp. Now I'm trying to calculate a new variable x that counts, for each observation, how many other observations happened within +/- 5 minute of that observation. I tried doing this with a for-loop (see below), but for some reason the loop uses 0.7 s per cycle, which makes it way to slow for the amount of observations I have. My hunch is that there is an easy solution to this using lapply, but I have no experience using the apply family of function. Thanks in advance for any help!

Klemens

for (i in 1 : nrow(data1)) {

Thanks again,
Klemens

#### trinker

##### ggplot2orBust
On my machine this takes ~30 seconds to run testing each obs against all others within that ID variable group:

Code:
test <- read.csv("https://dl.dropboxusercontent.com/u/4136460/sampledata.csv")

if (!require("pacman")) install.packages("pacman"); library(pacman)

# convert to data time
setDT(test)[, DateTime := as.POSIXct(as.character(DateTime))]

# difference function
differ <- function(x) unlist(lapply(x, function(y) sum(abs(y - x) < 301 )))

test[, count := differ(DateTime), by = "ID"][, count := count - 1]

Giving:

Code:
           X            DateTime ID count
1:     1 2014-05-20 15:00:11  3     3
2:     2 2014-05-20 15:02:25  3     4
3:     3 2014-05-20 15:03:02  3     4
4:     4 2014-05-20 15:04:44  3     5
5:     5 2014-05-20 15:07:24  3     5
---
43680: 43680 2014-07-13 18:58:05  5    21
43681: 43681 2014-07-13 18:58:21  5    19
43682: 43682 2014-07-13 18:58:51  5    17
43683: 43683 2014-07-13 18:59:13  5    16
43684: 43684 2014-07-13 18:59:29  5    14

#### ferrand

##### New Member
Sorry, I forgot to say thank you back then! It worked perfectly!