I expect that this is a basic question; however, I looked all the suggested posts and searched myself and I couldn't find the answer. I just to know why if I create a new dataframe based on factor variables from an existing one, I appear to lose the levels. Why does it happen and how can I keep the levels with a factor variable? Here's a reproducible example to demonstrate:
data(iris)
str(iris) # Species variable is of the class factor
iris.lm <- lm(Petal.Width ~ Species, iris) # Fit a simple model
summary(iris.lm) # Levels are displayed
# Now I make a new dataframe to do some fit quality checks
iris.plots <- as.data.frame(cbind(iris$Species, iris$Petal.Width, fitted(iris.lm),residuals(iris.lm)))
names(iris.plots) <- c("Species", "Observed", "Predicted", "Residuals")
# In the scatter plot to view Residuals by predictor (Species, of factor class), I have not maintained the levels.
plot(x = iris.plots$Species, y = iris.plots$Residuals)
head(iris.plots) # Confirming that I "lost" the levels
Thanks for your help!
When you use cbind
on numeric vectors (even if one of them is factor), as an output you receive matrix, without any information of levels. To prevent it, you can, for example, pass column of iris$Species
as data frame:
iris.plots <- cbind(as.data.frame(iris$Species), iris$Petal.Width, fitted(iris.lm),residuals(iris.lm))
Now, cbind
recognizes first column being data.frame
and uses version of method for data frames, which preserves levels.