Search code examples
rstringdataframefilterwhitespace

Can't remove row with empty value with the common methods nor whitespace of dataframe


I have a dataframe but it is not possible to remove row 8:

PKV_clean
   ID            x
1   1      scharfkantig
2   1                 t
4   1    seit paartagen
8   1                  
10  1          knirscht
11  1  schiene empohlen
12  1           meldet 

neither:

PKV_clean <- PKV_clean[!apply(is.na(PKV_clean) | PKV_clean == " ", 1, all),]

PKV_clean <- PKV_clean[!(is.na(PKV_clean$x) | PKV_clean$x ==""), ]

to remove NAs and also empty space.

nor can I remove the single whitespace in row 12, when I build a corpus.

PKV_clean <-  tm_map(PKV_clean, stripWhitespace)

This functions work, there is no error-message, but it doesn't remove anything. Could there be any hidden strings it doesn't show to the viewer?

Edit1:

dput(PKV_clean)
structure(list(ID = c("1", "1", "1", "1", "1", "1", "1"), x = c("    scharfkantig", 
"t", " seit paartagen", " ", " knirscht", " schiene empohlen", 
"  meldet ")), row.names = c(1L, 2L, 4L, 8L, 10L, 11L, 12L), class = "data.frame")

Solution

  • You have a lot of unnecessary space in your vector x. Row 8 is actually " ", not "". First, you can trim whitespace, and then filter out empty strings:

    library(dplyr)
    library(stringr)
    df %>% 
      mutate(x = str_trim(x)) %>% 
      filter(x != "")
    
      ID                x
    1  1     scharfkantig
    2  1                t
    3  1   seit paartagen
    4  1         knirscht
    5  1 schiene empohlen
    6  1           meldet
    

    More directly, you can just do this (if you don't care about the whitespace in the other parts of the column):

    df[df$x != " ", ]