Search code examples
rtidyverseplyr

R: How to add custom character inside string depends on the group?


How to add custom character inside string containing variable depends on the group variable?

Here is the dummy input dataset:

library(tidyverse)
unit <- c(50, 50, 40, 40, 30, 30, 20, 20, 10, 10)

id <- c("A100", "A101", "A102", "A103", "A100", "A101", "A102", "A103", "A101", "A100")
variation <- c("aaa1", "aaa1", "bbb1", "aaa2", "b1","a3", "a1", "b1", "a1", "b1" )
result <- c("Way1", "Way1", "Way2", "Way2", "Way3","Way1", "Way2", "Way3", "Way4", "Way1" )

data <- data.frame(id, variation, result, unit)
head(data)

#      id variation result unit
# 1  A100      aaa1   Way1   50
# 2  A101      aaa1   Way1   50
# 3  A102      bbb1   Way2   40
# 4  A103      aaa2   Way2   40
# 5  A100        b1   Way3   30
# 6  A101        a3   Way1   30
# 7  A102        a1   Way2   20
# 8  A103        b1   Way3   20
# 9  A101        a1   Way4   10
# 10 A100        b1   Way1   10

And is it possible to add custom string character in the "variation" columns depends on the "unit" column?

Here is the expected output::

#         id variation result unit
# 1  A100      A1.aaa1   Way1   50
# 2  A101      A1.aaa1   Way1   50
# 3  A102      A2.bbb1   Way2   40
# 4  A103      A2.aaa2   Way2   40
# 5  A100        A3.b1   Way3   30
# 6  A101        A3.a3   Way1   30
# 7  A102        A4.a1   Way2   20
# 8  A103        A4.b1   Way3   20
# 9  A101        A5.a1   Way4   10
# 10 A100        A5.b1   Way1   10

As you can see if "unit" variable is the same then same custom strings were added to that "variation" variable.

dplyr and base R functions preferred.


Solution

  • data |> mutate(variation = paste0("A", cumsum(unit != lag(unit, default = first(unit))) + 1,".", variation))
    

    Output:

         id variation result unit
    1  A100   A1.aaa1   Way1   50
    2  A101   A1.aaa1   Way1   50
    3  A102   A2.bbb1   Way2   40
    4  A103   A2.aaa2   Way2   40
    5  A100     A3.b1   Way3   30
    6  A101     A3.a3   Way1   30
    7  A102     A4.a1   Way2   20
    8  A103     A4.b1   Way3   20
    9  A101     A5.a1   Way4   10
    10 A100     A5.b1   Way1   10
    

    Note: it's probably better to just have a group column, instead of creating this. It doesn't appear that the letter at the beginning is adding anything.