Search code examples
rggplot2random-forestboxplotsurvival-analysis

How do I reorder the levels in this ggplot when the package I'm using forces the items into alphabetical order?


Let me randomly generate some data with available packages to demonstrate my issue. I am using the randomForestSRC package to run some survival random forests, and I am plotting the results of the random forest as a ggplot using the ggRandomForests package. You'll see the plot I get at the very end.

I want my boxplots in the order "Yes", then "No", then "Maybe".

library(ggplot2)
library(ggRandomForests)
library(randomForestSRC)
library(survival)

df <- cancer # should grab the cancer data set from survival library

# Randomly generate some categorical data
var <- sample(c('Yes', 'No', 'Maybe'), 228, replace=TRUE)
df$var <- as.factor(var)

# Attempt to put them in the order I want (first yes, then no, then maybe)
df$var <- factor(df$var, levels = c("Yes", "No", "Maybe"))
levels(df$var) # Verify it is in order of "Yes", "No", "Maybe"

# Run survival random forests
rf <- rfsrc(Surv(time, status) ~ var, data = df,
            ntree = 1000, samptype = "swr", seed = 12345, membership = TRUE)

# Create a plot of the outcome, writing the plot object to a variable
pl <- plot.variable(rf, xvar.names = "var", partial = TRUE, 
                    surv.type = "years.lost", time = 365, show.plots = FALSE)

# Create a ggplot with the plot object with the ggRandomForests package
# Also tack on some labels to demonstrate how this code works
plot(gg_partial(pl)) + xlab("Category") + ylab("Outcome")

If you got what I got, then you should be seeing the plots in alphabetical order: Maybe, No, Yes. Which is, of course, NOT the order I wanted.

The only way I know to rearrange the order in a ggplot is to use that levels argument; I don't know of any other method for fixing this. Any ideas?


Solution

  • You could set the order via the limits argument of scale_x_discrete:

    library(ggplot2)
    library(ggRandomForests)
    library(randomForestSRC)
    library(survival)
    
    plot(gg_partial(pl)) + 
      labs(x = "Category", y = "Outcome") +
      scale_x_discrete(limits = levels(df$var))
    

    enter image description here