I am trying to remove emojis from customers' reviews data in R. Emojis appear in this format <U+0001F603>.
For example, this is how a review appears in the dataset: "It's mind-blowing! <U+0001F603>" And I want to remove the <U+0001F603>.
I have tried gsub and iconv but did not work.
I really appreciate any help you can provide.
It depends a bit on how exactly your strings look like.
In your case, using plain regex may work. Replacing the emoji with a space may be preferable than just removing it, otherwise you risk ending up with two words merged into one.
stringr::str_replace_all(string = "It's mind-blowing! <U+0001F603>",
pattern = '<U.*>',
replacement = " ")
you may want to add stringr::str_squish()
to drop redundant spaces.