I have a data frame of two columns, with the second column (unit) mostly containing the first word of the first column (str). Please check out below:
> df <- data.frame(str = c("cups vegetable soup", "cup brown lentils", "carrot", "stalks celery"), unit = c("cups", "cup", NA, "stalks"), stringsAsFactors = FALSE)
> df
str unit
1 cups vegetable soup cups
2 cup brown lentils cup
3 carrot <NA>
4 stalks celery stalks
I want to erase the first word of $str if it matches the corresponding value (on the same row) over at $unit.
For that scope I created the function "DelFunction" depicted below:
DelFunction <- function(x, y) {
tokens_x <- x[[1]]
tokens_y <- y[[1]]
if ((tokens_x %like% tokens_y) == TRUE) {
regmatches(tokens_x, regexpr("[a-z]+", tokens_x)) <- ""
}
tokens_x
}
Following this, I used sapply on the respective row
df$str<- sapply(df$str, DelFunction, df$unit)
I get the following result, as you will see, the code just works for the first row, where the word "cups" is deleted.
> df
str unit
1 vegetable soup cups
2 cup brown lentils cup
3 carrot <NA>
4 stalks celery stalks
The goal was getting the following result
> df
str unit
1 vegetable soup cups
2 brown lentils cup
3 carrot <NA>
4 celery stalks
Does someone know how to approach the problem?
Thanks!
Possible answer:
library(stringr)
library(dplyr, warn.conflicts = FALSE)
df <-
data.frame(
str = c(
"cups vegetable soup",
"cup brown lentils",
"carrot",
"stalks celery"
),
unit = c("cups", "cup", NA, "stalks"),
stringsAsFactors = FALSE
)
df %>%
mutate(str = trimws(str_replace(str, unit, ''))) %>%
mutate(str = if_else(is.na(unit), df$str, str)) -> df2
df2
#> str unit
#> 1 vegetable soup cups
#> 2 brown lentils cup
#> 3 carrot <NA>
#> 4 celery stalks
Another possible answer without changing (much) your original code:
library(DescTools)
df <-
data.frame(
str = c(
"cups vegetable soup",
"cup brown lentils",
"carrot",
"stalks celery"
),
unit = c("cups", "cup", NA, "stalks"),
stringsAsFactors = FALSE
)
DelFunction <- function(x, y) {
tokens_x <- x
tokens_y <- paste0(y, "%")
if ((tokens_x %like% tokens_y) == TRUE) {
regmatches(tokens_x, regexpr("[a-z]+", tokens_x)) <- ""
}
trimws(tokens_x)
}
df$str <- sapply(df$str, DelFunction, df$unit)
df
#> str unit
#> 1 vegetable soup cups
#> 2 brown lentils cup
#> 3 carrot <NA>
#> 4 celery stalks