Search code examples
rdataframer-factor

One of the factor's levels is an empty string; how to replace it with non-missing value?


Data frame AEbySOC contains two columns - factor SOC with character levels and integer count Count:

> str(AEbySOC)
'data.frame':   19 obs. of  2 variables:
 $ SOC  : Factor w/ 19 levels "","Blood and lymphatic system disorders",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ Count: int  25 50 7 3 1 49 49 2 1 9 ...

One of the levels of SOC is an empty character string:

> l = levels(AEbySOC$SOC)
> l[1]
[1] ""

I want to replace the value of this level by a non-empty string, say, "Not specified". This does not work:

> library(plyr)
> revalue(AEbySOC$SOC, c(""="Not specified"))
Error: attempt to use zero-length variable name

Neither does this:

> AEbySOC$SOC[AEbySOC$SOC==""] = "Not specified"
Warning message:
In `[<-.factor`(`*tmp*`, AEbySOC$SOC == "", value = c(NA, 2L, 3L,  :
  invalid factor level, NA generated

What's the right way to implement this? I appreciate any input/comment.


Solution

  • levels(AEbySOC$SOC)[1] <- "Not specified"
    

    Created a toy example:

    df<- data.frame(a= c("", "a", "b"))
    
    df
    #  a
    #1  
    #2 a
    #3 b
    
    levels(df$a)
    #[1] ""  "a" "b"
    
    levels(df$a)[1] <- "Not specified"
    
    levels(df$a)
    #[1] "Not specified" "a"             "b" 
    

    EDIT

    As per the OP's comments if we need to find it according the value then in such case, we can try

    levels(AEbySOC$SOC)[levels(AEbySOC$SOC) == ""] <- "Not specified"