Search code examples
rdataframereplacestring-substitution

Replace character in a dataframe with another character


I want to change the character "F" to "X" in a dataframe. Please see below.

df <- data.frame(N=c(1,2,3,4,5,6),CAT=c('A','B','C','D','E','F'))
df

Result:
      N CAT
    1 1   A
    2 2   B
    3 3   C
    4 4   D
    5 5   E
    6 6   F

I've run this code and it doesn't work

    df$CAT[df$CAT == 'F'] <- 'X'

Error in `$<-.data.frame`(`*tmp*`, code, value = character(0)) : 
  replacement has 0 rows, data has 6

This code seems to work on other data I've imported via csv. Is there a reason why it doesn't work with this specific dataframe I've created? Any help much appreciated.


Solution

  • It is the proverbial stringsAsFactors=FALSE. For those reading it after R4.0 it is no longer a problem, but for many years before 2020 users struggled remembering that data.frame (and as.data.frame() for that matter) automatically coerces all strings to factors.

    What then happens is that you are trying to introduce new levels into a factor and this is not how it needs to be done in R. If creation of factor was not an intention, you could just modify your data frame creation code.

    df <- data.frame(N=c(1,2,3,4,5,6),
                     CAT=c('A','B','C','D','E','F'),
                     stringsAsFactors = FALSE)
    

    If you, however, wanted to create a factor, here's how you can go about modifying the levels and recoding one of the levels.

    df <- data.frame(N=c(1,2,3,4,5,6),
                     CAT=c('A','B','C','D','E','F'),
                     stringsAsFactors = TRUE)
    df
    str(df)
    #> 'data.frame':    6 obs. of  2 variables:
    #> $ N  : num  1 2 3 4 5 6
    #> $ CAT: Factor w/ 6 levels "A","B","C","D",..: 1 2 3 4 5 6
    
    levels(df$CAT)[levels(df$CAT)=="F"] <- "X"
    
    df
    
    #> N CAT
    #> 1 1   A
    #> 2 2   B
    #> 3 3   C
    #> 4 4   D
    #> 5 5   E
    #> 6 6   X