Avoiding looping when operations on repeated identifier

Very often I struggle to do basic operations in R because I have to control for unique identifier.

I work most of the time with "long format" data.

dt <- data.frame(id = c(rep("A1", 3), rep("B1", 3)),
             activity = c(15,17,12,3,4,15),
             begin = c( 0, 0, 1, 0, 1, 2 ) )

For example, computing time or observation by identifier

dt$time <- 1
for(i in 2:nrow(dt)){
  if(dt[i,'id'] == dt[i-1, 'id'])
  {
    dt[i,'time'] <- dt[i-1,'time'] + 1
  }
}

or double checking repeated data

dt$zerocheck = 0 
for(i in 2:nrow(dt)){
  if( dt[i,'id'] == dt[i-1, 'id'] & 
        dt[i,'begin'] == dt[i-1, 'begin'] )  
  {
   dt$zerocheck[i] <- 1
  }
}

I guess the answer will be something like aggregate by id, but I am not entirely sure.

merge(dt, aggregate(time ~ id, dt, "max"), by=c("id"), all.X=T)

Any suggestions in order to avoid doing loops ?

Solution

To add onto the other examples, you could also use dplyr

library(dplyr)
dt %>% group_by(id) %>% 
  mutate(time = row_number()) %>% # creates the control for identifier
  mutate(zerocheck= ifelse(begin==lag(begin), 1, 0)) # checks for repeated data

or equivalently you could just use a single mutate function like the following:

dt %>% 
  group_by(id) %>% 
  mutate(time = row_number(), 
         zerocheck=begin==lag(begin))

The first query has the output:

Source: local data frame [6 x 5]
Groups: id

  id activity begin time zerocheck
1 A1       15     0    1        NA
2 A1       17     0    2         1
3 A1       12     1    3         0
4 B1        3     0    1        NA
5 B1        4     1    2         0
6 B1       15     2    3         0

For the zerocheck case I simply used lag to check the previous value was the same as the current value. This mimics the code you have in your question. Of course if you want to check something else, you can quite easily alter the predicate.