Let me randomly generate some data with available packages to demonstrate my issue. I am using the randomForestSRC
package to run some survival random forests, and I am plotting the results of the random forest as a ggplot using the ggRandomForests
package. You'll see the plot I get at the very end.
I want my boxplots in the order "Yes", then "No", then "Maybe".
library(ggplot2)
library(ggRandomForests)
library(randomForestSRC)
library(survival)
df <- cancer # should grab the cancer data set from survival library
# Randomly generate some categorical data
var <- sample(c('Yes', 'No', 'Maybe'), 228, replace=TRUE)
df$var <- as.factor(var)
# Attempt to put them in the order I want (first yes, then no, then maybe)
df$var <- factor(df$var, levels = c("Yes", "No", "Maybe"))
levels(df$var) # Verify it is in order of "Yes", "No", "Maybe"
# Run survival random forests
rf <- rfsrc(Surv(time, status) ~ var, data = df,
ntree = 1000, samptype = "swr", seed = 12345, membership = TRUE)
# Create a plot of the outcome, writing the plot object to a variable
pl <- plot.variable(rf, xvar.names = "var", partial = TRUE,
surv.type = "years.lost", time = 365, show.plots = FALSE)
# Create a ggplot with the plot object with the ggRandomForests package
# Also tack on some labels to demonstrate how this code works
plot(gg_partial(pl)) + xlab("Category") + ylab("Outcome")
If you got what I got, then you should be seeing the plots in alphabetical order: Maybe, No, Yes. Which is, of course, NOT the order I wanted.
The only way I know to rearrange the order in a ggplot is to use that levels argument; I don't know of any other method for fixing this. Any ideas?
You could set the order via the limits
argument of scale_x_discrete
:
library(ggplot2)
library(ggRandomForests)
library(randomForestSRC)
library(survival)
plot(gg_partial(pl)) +
labs(x = "Category", y = "Outcome") +
scale_x_discrete(limits = levels(df$var))