I hope all is well. I need to find which row has:
pro
with an space before
and no characters after
, orcertificat
with an space before
, and characters can be after such as certificate
or certification
, ordigit
(could be one digit or more
)A piece of data is
df_new <- data.frame(
given_info=c('SA12 is given','he is Pro writer',
'she programmed','why not having an ra31',
'his bag missing', 'pa12 and certificate are given',
'schedule is ready','certification was awarded',
'meeting is canceled'))
df_new %>% select(given_info)
given_info
1 SA12 is given
2 he is Pro writer
3 she programmed
4 why not having an 1672
5 his bag missing
6 gift and certificate are given
7 schedule is ready
8 certification was awarded
9 meeting is canceled
Hence, the outcome of interest would be like:
given_info string_detected
1 SA12 is given 1
2 he is Pro writer 1
3 she programmed 0
4 why not having an 1672 1
5 his bag missing 0
6 gift and certificate are given 1
7 schedule is ready 0
8 certification was awarded 1
9 meeting is canceled 0
Something like this:
(^|\\s)[Pp]ro(\\s|$)
... matches the word "Pro" or "pro" surrounded by whitespace or appears at the beginning or end of the string(^|\\s)[Cc]ertificat(e|ion)?(\\s|$)
... matches either "[Cc]ertificate", "[Cc]ertification", or "[Cc]ertificat" surrounded by whitespace or appears at the beginning or end of the string.\\d+
... matches any sequence of one or more digits.library(dplyr)
library(stringr)
df_new %>%
mutate(string_detected = as.integer(str_detect(given_info, "(^|\\s)[Pp]ro(\\s|$)") |
str_detect(given_info, "(^|\\s)[Cc]ertificat(e|ion)?(\\s|$)") |
str_detect(given_info, "\\d+")))
given_info string_detected
1 SA12 is given 1
2 he is Pro writer 1
3 she programmed 0
4 why not having an ra31 1
5 his bag missing 0
6 pa12 and certificate are given 1
7 schedule is ready 0
8 certification was awarded 1
9 meeting is canceled 0