Search code examples
rstringi

stringi functions within dplyr


I wanted to modify the method in another post (Reading in Unicode Emoji correctly into R), to check if a unicode string corresponds to an emoji... but I obviously haven't quite grasped how to use stringi correctly.

The first section of code is a simplification of the linked post and works as expected; with first and last entries being replaced:

a <- c("\U0001f600",       "\U0001f603",       "\U0001f604")
b <- c("grinning face", "grinning face with big eyes", "grinning face with smiling eyes" )

v <- data.frame(lemma = c("\U0001f600",  "\U0001f3fb", "hello", "asdfasdlkasdfkd", "\U0001f604"), stringsAsFactors = FALSE)
v %>% mutate(is_emoji = stri_replace_all_regex(lemma,
                       pattern = a,
                       replacement = b,
                       vectorize_all=FALSE))

But my attempt to return a boolean does not; in addition to an warning message "longer object length is not a multiple of shorter object length", I am not getting the last value equal to TRUE with the following code:

v %>% mutate(is_emoji = stri_detect_regex(lemma, pattern = a))

I have tried countless other variation but all with no success.


Solution

  • Use paste with collapse='|'

    v %>% mutate(is_emoji = stri_detect_regex(lemma, pattern = paste(a, collapse = '|')))