Search code examples
rmatrixindicator

Create indicator


I would like to create a numeric indicator for a matrix such that for each unique element in one variable, it creates a sequence of the length based on the element in another variable. For example:

frame<- data.frame(x = c("a", "a", "a", "b", "b"), y = c(3,3,3,2,2))
frame
  x y
1 a 3
2 a 3
3 a 3
4 b 2
5 b 2

The indicator, z, should look like this:

  x y z
1 a 3 1
2 a 3 2
3 a 3 3
4 b 2 1
5 b 2 2

Any and all help greatly appreciated. Thanks.


Solution

  • No ave?

    frame$z <- with(frame, ave(y,x,FUN=seq_along) )
    frame
    
    #  x y z
    #1 a 3 1
    #2 a 3 2
    #3 a 3 3
    #4 b 2 1
    #5 b 2 2
    

    A data.table version could be something like below (thanks to @mnel):

    #library(data.table)
    #frame <- as.data.table(frame)
    frame[,z := seq_len(.N), by=x]
    

    My original thought was to use:

    frame[,z := .SD[,.I], by=x]
    

    where .SD refers to each subset of the data.table split by x. .I returns the row numbers for an entire data.table. So, .SD[,.I] returns the row numbers within each group. Although, as @mnel points out, this is inefficient compared to the other method as the entire .SD needs to be loaded into memory for each group to run this calculation.