Search code examples
rviolin-plotparty

partykit: Change the terminal node boxplots to violins


The package partykit offers a plotting function for decision trees plot.constparty(), which can display distributions of the terminal node with boxplots (node_boxplot()), minimal example using the iris dataset below.

library("partykit")
ct <- ctree(Petal.Length ~ Sepal.Length + Sepal.Width, data = iris, stump = TRUE)
plot(ct, terminal_panel = node_boxplot)

I would love to display the boxplots as violin plots. Since you can write your own panel functions, that should actually be possible. However, it seems that the violin plot needs to be setup using grid functions, so I have no clue how to do that. I imagine that this is quite cumbersome work, but I believe that many users would benefit from this panel function. Any suggestions on how to implement that? (A first lead points here: partykit: Change terminal node boxplots to bar graphs that shows mean and standard deviation)

Add on: Assume we had a strategy to plot terminal nodes with violins. How could we apply this strategy to multivariate responses to display violins instead of boxplots. See the following screenshot produced with the function node_mvar(): Decision tree with multivariate response: boxplots produced by node_mvar()


Solution

  • There are two natural strategies for this:

    1. Write a node_violinplot() panel-generating function similar to node_boxplot().
    2. Use ggplot2 via the ggparty package and leverage the existing geom_violin().

    For the first strategy, I would recommend to copy the code of node_boxplot() (including setting its class!) and rename it to, say node_violinplot(). Most of its code is responsible for setting up the right viewport and axis ranges etc. which can all be preserved. And then one would "only" replace the grid.lines() and grid.rect() for drawing the boxes with the calls for drawing the violin. I'm not sure what would be the best way to compute the coordinates for the violin elements, though.

    For the second strategy all building blocks are essentially available and just have to be customized to obtain the kind of violinplot that you would want. Fox example:

    ggparty with geom_violin and geom_boxplot as geom_node_plot

    This plot can be replicated as follows:

    ## example tree
    library("partykit")
    ct <- ctree(dist ~ speed, data = cars)
    
    ## visualization with ggparty + geom_violin
    library("ggparty")
    ggparty(ct) +
      geom_edge() +
      geom_edge_label() +
      geom_node_splitvar() +
      geom_node_plot(gglist = list(
        geom_violin(aes(x = "", y = dist)),
        geom_boxplot(aes(x = "", y = dist), coef = Inf, width = 0.1, fill = "lightgray"),
        xlab(""),
        theme_minimal()
      ))