Very often I struggle to do basic operations in R because I have to control for unique identifier.
I work most of the time with "long format" data.
dt <- data.frame(id = c(rep("A1", 3), rep("B1", 3)),
activity = c(15,17,12,3,4,15),
begin = c( 0, 0, 1, 0, 1, 2 ) )
For example, computing time or observation by identifier
dt$time <- 1
for(i in 2:nrow(dt)){
if(dt[i,'id'] == dt[i-1, 'id'])
{
dt[i,'time'] <- dt[i-1,'time'] + 1
}
}
or double checking repeated data
dt$zerocheck = 0
for(i in 2:nrow(dt)){
if( dt[i,'id'] == dt[i-1, 'id'] &
dt[i,'begin'] == dt[i-1, 'begin'] )
{
dt$zerocheck[i] <- 1
}
}
I guess the answer will be something like aggregate by id, but I am not entirely sure.
merge(dt, aggregate(time ~ id, dt, "max"), by=c("id"), all.X=T)
Any suggestions in order to avoid doing loops ?
To add onto the other examples, you could also use dplyr
library(dplyr)
dt %>% group_by(id) %>%
mutate(time = row_number()) %>% # creates the control for identifier
mutate(zerocheck= ifelse(begin==lag(begin), 1, 0)) # checks for repeated data
or equivalently you could just use a single mutate function like the following:
dt %>%
group_by(id) %>%
mutate(time = row_number(),
zerocheck=begin==lag(begin))
The first query has the output:
Source: local data frame [6 x 5]
Groups: id
id activity begin time zerocheck
1 A1 15 0 1 NA
2 A1 17 0 2 1
3 A1 12 1 3 0
4 B1 3 0 1 NA
5 B1 4 1 2 0
6 B1 15 2 3 0
For the zerocheck
case I simply used lag to check the previous value was the same as the current value. This mimics the code you have in your question. Of course if you want to check something else, you can quite easily alter the predicate.