Search code examples

R tweets with emojis

I scrapped tweets from the twitter API and the package rtweet but I don't know how to work with text with emojis because they are in the form '\U0001f600' and all the regex code that I tried failed until now. I can't get anything of it.

For example

 text = 'text text. \U0001f600'

Give me FALSE


Also give me FALSE.

Another problem is that they are often sticked to the word before (for example i am here\U0001f600 )

So how can I make R recognize emojis of that format? What can I put in the grepl that will return me TRUE for any emojis of that format?


  • In R there tends to be a package for most things. And in this case textclean and with it comes the lexicon package which has a lot of dictionaries. Using textclean you have 2 functions you can use, replace_emoji and replace_emoji_identifier

    text = c("text text. \U0001f600", "i am here\U0001f600")
    # replace emoji with identifier:
    [1] "text text. lexiconvygwtlyrpywfarytvfis " "i am here lexiconvygwtlyrpywfarytvfis " 
    # replace emoji with text representation
    [1] "text text. grinning face " "i am here grinning face " 

    Next you could use sentimentr to use sentiment scoring on the emoji's or for text analysis quanteda. If you just want to check the presence as in your expected output:

    grepl("lexicon[[:alpha:]]{20}", textclean::replace_emoji_identifier(text))
    [1] TRUE TRUE