I need to count the instance number of each combination of variables, and turn that into a new variable. For example,
set.seed(2)
V1 <- sample(rep(c(1:3),10))
V2 <- rep_len(c("small", "large"),30)
temp <- cbind(V1,V2)
yields a dataframe whose first ten lines look like this:
V1 V2
[1,] "3" "small"
[2,] "3" "large"
[3,] "3" "small"
[4,] "1" "large"
[5,] "2" "small"
[6,] "2" "large"
[7,] "1" "small"
[8,] "3" "large"
[9,] "3" "small"
[10,] "3" "large"
I need a new variable that is a count of how many times that combination of variables came up in the dataframe so far. The result should look something like:
V1 V2 V3
[1,] "3" "small" "1"
[2,] "3" "large" "1"
[3,] "3" "small" "2"
[4,] "1" "large" "1"
[5,] "2" "small" "1"
[6,] "2" "large" "1"
[7,] "1" "small" "1"
[8,] "3" "large" "2"
[9,] "3" "small" "3"
[10,] "3" "large" "3"
What's an efficient way to do this? (I don't need them to be character vectors necessarily; I just need a general solution.)
We can group by 'V1', 'V2' after converting to data.frame
and then create the new column as the sequence of rows with row_number()
library(dplyr)
as.data.frame(temp) %>%
group_by(V1, V2) %>%
mutate(V3 = row_number())
temp <- structure(list(V1 = c(3L, 3L, 3L, 1L, 2L, 2L, 1L, 3L, 3L, 3L),
V2 = c("small", "large", "small", "large", "small", "large",
"small", "large", "small", "large")), class = "data.frame",
row.names = c(NA,
-10L))