Search code examples
rdataframeduplicatesauto-increment

Auto-increment field based on duplicate values of other fields


I have a dataframe like so:

df <- data.frame(
      COL1 = rep("28", 10),
      COL2 = c(rep("33", 6), rep("45", 4)),
      COL3 = c("10", "10", "10", "10", "23", "23", "22", "22", "22", "29")
    )
> df
#    COL1 COL2 COL3
# 1    28   33   10
# 2    28   33   10
# 3    28   33   10
# 4    28   33   10
# 5    28   33   23
# 6    28   33   23
# 7    28   45   22
# 8    28   45   22
# 9    28   45   22
# 10   28   45   29

Now I want to populate COL4 auto-incrementally based on duplicate values for COL1 to COL3. The output should look like this:

> df
#    COL1 COL2 COL3 COL4
# 1    28   33   10    1
# 2    28   33   10    2
# 3    28   33   10    3
# 4    28   33   10    4
# 5    28   33   23    1
# 6    28   33   23    2
# 7    28   45   22    1
# 8    28   45   22    2
# 9    28   45   22    3
# 10   28   45   29    1

How would I go about doing this? Any help will be appreciated.

Note: columns are not necessarily sorted.


Solution

  • base R

    With base R, use ave:

    vec <- apply(df, 1, paste, collapse = "")
    df$COL4 <- ave(vec, vec, FUN = seq_along)
    

    dplyr

    With dplyr, group_by all columns and use row_number. If they are more columns to group_by, you can use group_by(across(everything())).

    library(dplyr)
    df %>% 
      group_by(COL1, COL2, COL3) %>% 
      mutate(COL4 = row_number())
    

    output

       COL1  COL2  COL3   COL4
       <chr> <chr> <chr> <int>
     1 28    33    10        1
     2 28    33    10        2
     3 28    33    10        3
     4 28    33    10        4
     5 28    33    23        1
     6 28    33    23        2
     7 28    45    22        1
     8 28    45    22        2
     9 28    45    22        3
    10 28    45    29        1