Search code examples
rggplot2facet-grid

R ggplot facet_grid multi boxplot


Using ggplot and facet_grid, I'd like to visualize two parallel vector of values through a box plot. My available data:

DF <- data.frame("value" =  runif(50, 0, 1),
             "value2" = runif(50,0,1),
             "type1" = c(rep("AAAAAAAAAAAAAAAAAAAAAA", 25), 
                         rep("BBBBBBBBBBBBBBBBB", 25)),
             "type2" = rep(c("c", "d"), 25), 
             "number" = rep(2:6, 10))

The code at the moment permit to visualize only one vector of values:

ggplot(DF, aes(y=value, x=type1)) + 
  geom_boxplot(alpha=.3, aes(fill = type1)) + 
  ggtitle("TITLE") + 
  facet_grid(type2 ~ number) +
  scale_x_discrete(name = NULL, breaks = NULL) + # these lines are optional
  theme(legend.position = "bottom")

This is my plot at the moment.

enter image description here

I'd like to visualize a parallel box plot one for each vector (value and value2 in dataframe). Then for each colored boxplot, I'd like to have two boxplot one for value and another one for value2


Solution

  • I think there's likely a post that already addresses it, in addition to the one I linked to above. But this is a problem of two things: 1) getting data into the format that ggplot expects, i.e. long-shaped so there are values to map onto aesthetics, and 2) separation of concerns, in that you can use reshape2 or (more up-to-date) tidyr functions to get data into the proper shape, and ggplot2 functions to plot it.

    You can use tidyr::gather for getting long data, and conveniently pipe it directly into ggplot.

    library(tidyverse)
    ...
    

    To illustrate, though with very generic column names:

    DF %>%
      gather(key, value = val, value, value2) %>%
      head()
    #>                    type1 type2 number   key       val
    #> 1 AAAAAAAAAAAAAAAAAAAAAA     c      2 value 0.5075600
    #> 2 AAAAAAAAAAAAAAAAAAAAAA     d      3 value 0.6472347
    #> 3 AAAAAAAAAAAAAAAAAAAAAA     c      4 value 0.7543778
    #> 4 AAAAAAAAAAAAAAAAAAAAAA     d      5 value 0.7215786
    #> 5 AAAAAAAAAAAAAAAAAAAAAA     c      6 value 0.1529630
    #> 6 AAAAAAAAAAAAAAAAAAAAAA     d      2 value 0.8779413
    

    Pipe that directly into ggplot:

    DF %>%
      gather(key, value = val, value, value2) %>%
      ggplot(aes(x = key, y = val, fill = type1)) +
        geom_boxplot() +
        facet_grid(type2 ~ number) +
        theme(legend.position = "bottom")
    

    Again, because of some of the generic column names, I'm not entirely sure this is the setup you want—like I don't know the difference in value / value2 vs AAAAAAA / BBBBBBB. You might need to swap aes assignments around accordingly.