I have a dataframe like so:
df <- data.frame(
COL1 = rep("28", 10),
COL2 = c(rep("33", 6), rep("45", 4)),
COL3 = c("10", "10", "10", "10", "23", "23", "22", "22", "22", "29")
)
> df
# COL1 COL2 COL3
# 1 28 33 10
# 2 28 33 10
# 3 28 33 10
# 4 28 33 10
# 5 28 33 23
# 6 28 33 23
# 7 28 45 22
# 8 28 45 22
# 9 28 45 22
# 10 28 45 29
Now I want to populate COL4 auto-incrementally based on duplicate values for COL1 to COL3. The output should look like this:
> df
# COL1 COL2 COL3 COL4
# 1 28 33 10 1
# 2 28 33 10 2
# 3 28 33 10 3
# 4 28 33 10 4
# 5 28 33 23 1
# 6 28 33 23 2
# 7 28 45 22 1
# 8 28 45 22 2
# 9 28 45 22 3
# 10 28 45 29 1
How would I go about doing this? Any help will be appreciated.
Note: columns are not necessarily sorted.
With base R, use ave
:
vec <- apply(df, 1, paste, collapse = "")
df$COL4 <- ave(vec, vec, FUN = seq_along)
With dplyr
, group_by
all columns and use row_number
. If they are more columns to group_by, you can use group_by(across(everything()))
.
library(dplyr)
df %>%
group_by(COL1, COL2, COL3) %>%
mutate(COL4 = row_number())
output
COL1 COL2 COL3 COL4
<chr> <chr> <chr> <int>
1 28 33 10 1
2 28 33 10 2
3 28 33 10 3
4 28 33 10 4
5 28 33 23 1
6 28 33 23 2
7 28 45 22 1
8 28 45 22 2
9 28 45 22 3
10 28 45 29 1