Search code examples
rtextnlpanalysis

Removing emojis in R


I am trying to remove emojis from customers' reviews data in R. Emojis appear in this format <U+0001F603>.

For example, this is how a review appears in the dataset: "It's mind-blowing! <U+0001F603>" And I want to remove the <U+0001F603>.

I have tried gsub and iconv but did not work.

I really appreciate any help you can provide.


Solution

  • It depends a bit on how exactly your strings look like.

    In your case, using plain regex may work. Replacing the emoji with a space may be preferable than just removing it, otherwise you risk ending up with two words merged into one.

    stringr::str_replace_all(string = "It's mind-blowing! <U+0001F603>",
                             pattern = '<U.*>',
                             replacement = " ")
    

    you may want to add stringr::str_squish() to drop redundant spaces.