Search code examples
rfactors

Factor becomes character when assigned to data frame


I have a factor and want to make it a column in a data frame. But I was surprised to find out that it was automatically turned into a character, even if I specified stringsAsFactors.

Here is the MWE:

a <- data.frame(dummy=1:5, stringsAsFactors = TRUE)
b <- as.factor(c("Monday", "Tuesday", "Monday", "Thursday", "Tuesday"))
a["d"] <- b

> levels(a["d"])
NULL

How do I do the assignment so I get an actual factor, keeping the original levels?

It is important that I cannot use solutions which convert the factor afterwards, because in the example, it would get the levels 'Monday Thursday Tuesday' while I have prepared a factor which has all proper levels, and in the needed sequence (in this example, it would be all the days of the week in row).


Solution

  • It is because of the difference in extracting the columns. The a['d'] is still a data.frame with 'd' as column, while a[, 'd'] or a[['d']] or a$d all extracts the 'd' column as a vector with class as factor. To see the difference, we check the str()

    str(a['d'])
    #'data.frame':   5 obs. of  1 variable:
    #$ d: Factor w/ 3 levels "Monday","Thursday",..: 1 3 1 2 3
    
    str(a[['d']])
    #Factor w/ 3 levels "Monday","Thursday",..: 1 3 1 2 3
    
    levels(a["d"])
    #NULL
    
    levels(a[["d"]])
    #[1] "Monday"   "Thursday" "Tuesday"