Search code examples
rdataframefactorslevels

How to keep levels when using factor variables in a new dataframe


I expect that this is a basic question; however, I looked all the suggested posts and searched myself and I couldn't find the answer. I just to know why if I create a new dataframe based on factor variables from an existing one, I appear to lose the levels. Why does it happen and how can I keep the levels with a factor variable? Here's a reproducible example to demonstrate:

data(iris)
str(iris) # Species variable is of the class factor
iris.lm <- lm(Petal.Width ~ Species, iris) # Fit a simple model
summary(iris.lm) # Levels are displayed

# Now I make a new dataframe to do some fit quality checks
iris.plots <- as.data.frame(cbind(iris$Species, iris$Petal.Width, fitted(iris.lm),residuals(iris.lm)))
names(iris.plots) <- c("Species", "Observed", "Predicted", "Residuals")

# In the scatter plot to view Residuals by predictor (Species, of factor class), I have not maintained the levels.
plot(x = iris.plots$Species, y = iris.plots$Residuals)
head(iris.plots) # Confirming that I "lost" the levels

Thanks for your help!


Solution

  • When you use cbind on numeric vectors (even if one of them is factor), as an output you receive matrix, without any information of levels. To prevent it, you can, for example, pass column of iris$Species as data frame:

    iris.plots <- cbind(as.data.frame(iris$Species), iris$Petal.Width, fitted(iris.lm),residuals(iris.lm))
    

    Now, cbind recognizes first column being data.frame and uses version of method for data frames, which preserves levels.