Search code examples

Empty string with length > 0 in R

I got a weird case in my dataframe while working with emojis in R. I want to delete all emojis for a sentiment analysis. When I do this I got some cases, where the string should be empty, but isn't. What is the problem? I would like to replace empty fields with NA. Here a little example:


df <- data.frame(x = c("test","♥️♥️🙌♥"))


df_new <- df |>
  mutate(x = str_remove_all(x, "[[:emoji:]]"))


Now I would like to use the following command, but this doesn't work, because the string is not empty.

tmp <- df_new |>
  mutate(x = na_if(x, ""))

What is the problem here and how I can solve this?

Thank you in advance,



  • If you want to remove all non-characters but support any language and if you already split up your x values as words you can simply do:

    df <- data.frame(x = c("test","♥️♥️🙌♥"))
    df %>%
      mutate(x = stri_extract_all(x, charclass = "\\p{L}"))
    1 test
    2   NA

    If you have strings with multiple words you can slightly adapt above and use this instead

    df <- data.frame(x = c("Ελλάδα means Greece ♥️ ️", "test","♥️♥️🙌♥"))
    df %>%
      group_by(x) %>%
      mutate(x = paste(stri_extract_all(x, charclass = "\\p{L}")[[1]], collapse = " "))
    1 Ελλάδα means Greece
    2 test           
    3 NA