Search code examples
rloopsidentifier

Avoiding looping when operations on repeated identifier


Very often I struggle to do basic operations in R because I have to control for unique identifier.

I work most of the time with "long format" data.

dt <- data.frame(id = c(rep("A1", 3), rep("B1", 3)),
             activity = c(15,17,12,3,4,15),
             begin = c( 0, 0, 1, 0, 1, 2 ) )

For example, computing time or observation by identifier

dt$time <- 1
for(i in 2:nrow(dt)){
  if(dt[i,'id'] == dt[i-1, 'id'])
  {
    dt[i,'time'] <- dt[i-1,'time'] + 1
  }
}

or double checking repeated data

dt$zerocheck = 0 
for(i in 2:nrow(dt)){
  if( dt[i,'id'] == dt[i-1, 'id'] & 
        dt[i,'begin'] == dt[i-1, 'begin'] )  
  {
   dt$zerocheck[i] <- 1
  }
}

I guess the answer will be something like aggregate by id, but I am not entirely sure.

merge(dt, aggregate(time ~ id, dt, "max"), by=c("id"), all.X=T)

Any suggestions in order to avoid doing loops ?


Solution

  • To add onto the other examples, you could also use dplyr

    library(dplyr)
    dt %>% group_by(id) %>% 
      mutate(time = row_number()) %>% # creates the control for identifier
      mutate(zerocheck= ifelse(begin==lag(begin), 1, 0)) # checks for repeated data
    

    or equivalently you could just use a single mutate function like the following:

    dt %>% 
      group_by(id) %>% 
      mutate(time = row_number(), 
             zerocheck=begin==lag(begin))
    

    The first query has the output:

    Source: local data frame [6 x 5]
    Groups: id
    
      id activity begin time zerocheck
    1 A1       15     0    1        NA
    2 A1       17     0    2         1
    3 A1       12     1    3         0
    4 B1        3     0    1        NA
    5 B1        4     1    2         0
    6 B1       15     2    3         0
    

    For the zerocheck case I simply used lag to check the previous value was the same as the current value. This mimics the code you have in your question. Of course if you want to check something else, you can quite easily alter the predicate.