I wanted to modify the method in another post (Reading in Unicode Emoji correctly into R), to check if a unicode string corresponds to an emoji... but I obviously haven't quite grasped how to use stringi correctly.
The first section of code is a simplification of the linked post and works as expected; with first and last entries being replaced:
a <- c("\U0001f600", "\U0001f603", "\U0001f604")
b <- c("grinning face", "grinning face with big eyes", "grinning face with smiling eyes" )
v <- data.frame(lemma = c("\U0001f600", "\U0001f3fb", "hello", "asdfasdlkasdfkd", "\U0001f604"), stringsAsFactors = FALSE)
v %>% mutate(is_emoji = stri_replace_all_regex(lemma,
pattern = a,
replacement = b,
vectorize_all=FALSE))
But my attempt to return a boolean does not; in addition to an warning message "longer object length is not a multiple of shorter object length", I am not getting the last value equal to TRUE with the following code:
v %>% mutate(is_emoji = stri_detect_regex(lemma, pattern = a))
I have tried countless other variation but all with no success.
Use paste
with collapse='|'
v %>% mutate(is_emoji = stri_detect_regex(lemma, pattern = paste(a, collapse = '|')))