Search code examples

Remove string from column across group of rows in another column

I would like to remove a string from one column across a group of rows in another column. In the below reprex, I would like to remove the string in snippet from the string in text in any row in the group id. So, for id == "p1", both "apple" and "orange" should be removed from both rows of text for that group, leaving " and "; "apple" should not be removed for other groups, e.g., for id == "p2", text should remain "fruits with apple".

I tried to use dplyr::group_by and stringr::str_remove_all, which didn't work.

Thank you for your help.


df_in <- tibble::tribble(
  ~id, ~snippet, ~text,
  "p1", "apple", " and orange",
  "p1", "orange", "apple and ",
  "p2", "kiwi", "fruits with apple"

df_out <- tibble::tribble(
  ~id, ~snippet, ~text,
  "p1", "apple", " and ",
  "p1", "orange", " and ",
  "p2", "kiwi", "fruits with apple"

df_in |> 
  group_by(id) |> 
  mutate(text = stringr::str_remove_all(text, snippet))
#> # A tibble: 3 × 3
#> # Groups:   id [2]
#>   id    snippet text               
#>   <chr> <chr>   <chr>              
#> 1 p1    apple   " and orange"      
#> 2 p1    orange  "apple and "       
#> 3 p2    kiwi    "fruits with apple"

Created on 2024-07-16 with reprex v2.0.2


  • The trick is that you need to add a | in the regex. So you just need to modify the code like this

    df_in <- tibble::tribble(
      ~id, ~snippet, ~text,
      "p1", "apple", " and orange",
      "p1", "orange", "apple and ",
      "p2", "kiwi", "fruits with apple"
    df_in |>
      mutate(text = str_remove_all(text,
                                  paste(snippet, collapse = '|')), .by = id)
    #> # A tibble: 3 × 3
    #>   id    snippet text               
    #>   <chr> <chr>   <chr>              
    #> 1 p1    apple   " and "            
    #> 2 p1    orange  " and "            
    #> 3 p2    kiwi    "fruits with apple"

    Created on 2024-07-16 with reprex v2.1.1