Search code examples
rlevels

"other" turn into "NA"


I have a data.frame (DL) and one of the column name is fruit and it is like c("apple", "lemon", "orange", "others") so I want to change level this column so that the order of legend (when I create plot) will follow the order that I want. Here is my code

DL$fruit <- factor(DL$fruit, levels=c("lemon", "apple",  "orange", "others"))

But when I view this data after this using View(DL), the "others" will change to "NA". When I ggplot this and it will not show bar of "others". Does anyone have an idea what is going on and how to fix it? Thanks.


Solution

  • This sometimes happens if your data are not quite clean--for example, if you have extra whitespace around the input values.

    Here's an example:

    fruit <- c("apple", "lemon", "orange", "others", "others ") ## note the last two values
    factor(fruit, levels=c("lemon", "apple",  "orange", "others"))
    # [1] apple  lemon  orange others <NA>  
    # Levels: lemon apple orange others
    

    Now, let's strip out the whitespace:

    newFruit <- gsub("^\\s+|\\s+$", "", fruit)
    factor(newFruit, levels = unique(newFruit))
    # [1] apple  lemon  orange others others
    # Levels: apple lemon orange others
    

    If you want to inspect the source data and look for whitespace, sometimes it helps to use print, with quote = TRUE:

    print(fruit, quote = TRUE)
    # [1] "apple"   "lemon"   "orange"  "others"  "others "
    

    Alternatively, grepl could also be of use:

    grepl("^\\s+|\\s+$", fruit)
    # [1] FALSE FALSE FALSE FALSE  TRUE