While trying to plot my data, I found an unexpected behavior which caused my groups to be improperly rearranged and mislabeled.
In short, storing a factor object to several columns of a data frame causes it to be coerced to a character, rather than a factor. This seems related to the previously-answered question here but I still don't understand why it happens.
# x is a factor
(x = factor(c("red", "blue", "green")))
# make a data frame
frame = data.frame("y"=1:3, "z"=1:3)
# replacing one column at a time yields a factor
frame[,"y"] = x; class(frame[,"y"])
frame[,"z"] = x; class(frame[,"z"])
# however, replacing >1 column at a time yields a character
frame[,c("y", "z")] = x
class(frame$y); class(frame$z)
Factors in R tend to cause me the most heartburn, somehow! The ordering, the combination of numerical value and character level, the general fiddliness... Anyway, I'm sure it's something I don't understand about the particular properties of data frames. Your help is appreciated!
So the problem is in the [<-.data.frame
function which is what runs when you do an assignment like
frame[,c("y", "z")] = x
The problem is that when you specify more than one column as you have, if the new value is not a list, it will convert it to a matrix with the correct number of rows and columns and then split it into a list. So the problem with factors is that you cannot store them in a matrix. you can see this if you try
matrix(x, nrow=3, ncol=2)
Again, this casting is happening because you are specifying more than one column, and the new value is not a list. So one way around this is to give a list as the new value instead.
frame[,c("y", "z")] <- list(x)
So, it's a bit annoying that factors are so scared of matrices, but once you learn to master them, they really are a powerful feature of R. Don't be discouraged!