Search code examples
rdataframerecode

Create an "instance number" for each instance of a combination of variable levels


I need to count the instance number of each combination of variables, and turn that into a new variable. For example,

set.seed(2)
V1 <- sample(rep(c(1:3),10))
V2 <- rep_len(c("small", "large"),30)
temp <- cbind(V1,V2)

yields a dataframe whose first ten lines look like this:

       V1  V2     
 [1,] "3" "small"
 [2,] "3" "large"
 [3,] "3" "small"
 [4,] "1" "large"
 [5,] "2" "small"
 [6,] "2" "large"
 [7,] "1" "small"
 [8,] "3" "large"
 [9,] "3" "small"
[10,] "3" "large"

I need a new variable that is a count of how many times that combination of variables came up in the dataframe so far. The result should look something like:

       V1  V2      V3 
 [1,] "3" "small" "1"
 [2,] "3" "large" "1"
 [3,] "3" "small" "2"
 [4,] "1" "large" "1"
 [5,] "2" "small" "1"
 [6,] "2" "large" "1"
 [7,] "1" "small" "1"
 [8,] "3" "large" "2"
 [9,] "3" "small" "3"
[10,] "3" "large" "3"

What's an efficient way to do this? (I don't need them to be character vectors necessarily; I just need a general solution.)


Solution

  • We can group by 'V1', 'V2' after converting to data.frame and then create the new column as the sequence of rows with row_number()

    library(dplyr)
    as.data.frame(temp) %>%
          group_by(V1, V2) %>%
          mutate(V3 = row_number())
    

    data

    temp <- structure(list(V1 = c(3L, 3L, 3L, 1L, 2L, 2L, 1L, 3L, 3L, 3L), 
        V2 = c("small", "large", "small", "large", "small", "large", 
        "small", "large", "small", "large")), class = "data.frame", 
        row.names = c(NA, 
    -10L))