Search code examples
rggplot2jupyter-notebookfacet-wrapfacet-grid

What is a good output format to combind multiple similar plots in r?


Let's say I have 100 variables column and 1 label column. The label is categorical, for example, 1,2,and 3. Now for each variable I would like to generate a plot for each category(e.g. boxplot). Is there a good format to show all plot? By using facet_grid, it seems that we can only put 16 plots together, otherwise the plot will be too small.

Example code:

label = sample.int(3, 50, replace = TRUE)
var = as.matrix(matrix(rnorm(5000),50,100))
data = as.data.frame(cbind(var,label))

Solution

  • Ultimately, if you want a box for each of 3 groups for each column of your data, then you would need 300 boxes in total. This seems like a bad idea from a data visualisation perspective. A plot should allow your data to tell a story, but the only story a plot like that could show is "I can make a very crowded plot". In terms of getting it to look nice, you would need a lot of room to plot this, so if it were on a large poster it might work.

    To fit it all in to a single page with minimal room taken up by axis annotations, you could do something like:

    library(tidyverse)
    
    pivot_longer(data, -label) %>%
      mutate(name = as.numeric(sub('V', '', name))) %>%
      mutate(row = (name - 1) %/% 20,
             label = factor(label)) %>%
      ggplot(aes(factor(name), value, fill = label)) +
      geom_boxplot() +
      facet_wrap(row~., nrow = 5, scales = 'free_x') +
      labs(x = "data frame column") +
      theme(strip.background = element_blank(),
            strip.text = element_blank())
    

    enter image description here

    But this is still far from ideal.

    An alternative, depending on the nature of your data columns, would be to plot the column number as a continuous variable. That way, you can represent the distribution in each column via its density, allowing for a heatmap-type plot which might actually convey your data's story better:

    pivot_longer(data, -label) %>%
      mutate(x = as.numeric(sub('V', '', name))) %>%
      mutate(label = factor(label)) %>%
      group_by(x, label) %>%
      summarize(y = density(value, from = -6, to = 6)$x, 
                z = density(value, from = -6, to = 6)$y) %>%
      ggplot(aes(x, y, fill = label, alpha = z)) +
      geom_raster() +
      coord_cartesian(expand = FALSE) +
      labs(x = 'data frame column', y = 'value', alpha = 'density') +
      facet_grid(label~.) +
      guides(fill = 'none') +
      theme_bw()
    

    enter image description here