I want to use filter
command from dplyr
along with str_detect
.
library(tidyverse)
dt1 <-
tibble(
No = c(1, 2, 3, 4)
, Text = c("I have a pen.", "I have a book.", "I have a pencile.", "I have a pen and a book.")
)
dt1
# A tibble: 4 x 2
No Text
<dbl> <chr>
1 1 I have a pen.
2 2 I have a book.
3 3 I have a pencile.
4 4 I have a pen and a book.
MatchText <- c("Pen", "Book")
dt1 %>%
filter(str_detect(Text, regex(paste0(MatchText, collapse = '|'), ignore_case = TRUE)))
# A tibble: 4 x 2
No Text
<dbl> <chr>
1 1 I have a pen.
2 2 I have a book.
3 3 I have a pencile.
4 4 I have a pen and a book.
Required Output
I want the following output in more efficient way (since in my original problem there would be many unknown element of MatchText).
dt1 %>%
filter(str_detect(Text, regex("Pen", ignore_case = TRUE))) %>%
select(-Text) %>%
mutate(MatchText = "Pen") %>%
bind_rows(
dt1 %>%
filter(str_detect(Text, regex("Book", ignore_case = TRUE))) %>%
select(-Text) %>%
mutate(MatchText = "Book")
)
# A tibble: 5 x 2
No MatchText
<dbl> <chr>
1 1 Pen
2 3 Pen
3 4 Pen
4 2 Book
5 4 Book
Any hint to accomplish the above task more efficiently.
library(tidyverse)
dt1 %>%
mutate(
result = str_extract_all(Text, regex(paste0("\\b", MatchText, "\\b", collapse = '|'),ignore_case = TRUE))
) %>%
unnest(result) %>%
select(-Text)
# # A tibble: 4 x 2
# No result
# <dbl> <chr>
# 1 1 pen
# 2 2 book
# 3 4 pen
# 4 4 book
I'm not sure what happened to the "whole words" part of your question after edits - I left in the word boundaries to match whole words, but since "pen" isn't a whole word match for "pencile", my result doesn't match yours. Get rid of the \\b
if you want partial word matches.