I would like to remove a string from one column across a group of rows in another column. In the below reprex, I would like to remove the string in snippet
from the string in text
in any row in the group id
. So, for id == "p1"
, both "apple" and "orange" should be removed from both rows of text
for that group, leaving " and "; "apple" should not be removed for other groups, e.g., for id == "p2"
, text
should remain "fruits with apple".
I tried to use dplyr::group_by
and stringr::str_remove_all
, which didn't work.
Thank you for your help.
library(dplyr)
df_in <- tibble::tribble(
~id, ~snippet, ~text,
"p1", "apple", " and orange",
"p1", "orange", "apple and ",
"p2", "kiwi", "fruits with apple"
)
df_out <- tibble::tribble(
~id, ~snippet, ~text,
"p1", "apple", " and ",
"p1", "orange", " and ",
"p2", "kiwi", "fruits with apple"
)
# DOESN'T WORK
df_in |>
group_by(id) |>
mutate(text = stringr::str_remove_all(text, snippet))
#> # A tibble: 3 × 3
#> # Groups: id [2]
#> id snippet text
#> <chr> <chr> <chr>
#> 1 p1 apple " and orange"
#> 2 p1 orange "apple and "
#> 3 p2 kiwi "fruits with apple"
Created on 2024-07-16 with reprex v2.0.2
The trick is that you need to add a |
in the regex. So you just need to modify the code like this
library(tidyverse)
df_in <- tibble::tribble(
~id, ~snippet, ~text,
"p1", "apple", " and orange",
"p1", "orange", "apple and ",
"p2", "kiwi", "fruits with apple"
)
df_in |>
mutate(text = str_remove_all(text,
paste(snippet, collapse = '|')), .by = id)
#> # A tibble: 3 × 3
#> id snippet text
#> <chr> <chr> <chr>
#> 1 p1 apple " and "
#> 2 p1 orange " and "
#> 3 p2 kiwi "fruits with apple"
Created on 2024-07-16 with reprex v2.1.1