Let's say I have 100 variables column and 1 label column. The label is categorical, for example, 1,2,and 3. Now for each variable I would like to generate a plot for each category(e.g. boxplot). Is there a good format to show all plot? By using facet_grid, it seems that we can only put 16 plots together, otherwise the plot will be too small.
Example code:
label = sample.int(3, 50, replace = TRUE)
var = as.matrix(matrix(rnorm(5000),50,100))
data = as.data.frame(cbind(var,label))
Ultimately, if you want a box for each of 3 groups for each column of your data, then you would need 300 boxes in total. This seems like a bad idea from a data visualisation perspective. A plot should allow your data to tell a story, but the only story a plot like that could show is "I can make a very crowded plot". In terms of getting it to look nice, you would need a lot of room to plot this, so if it were on a large poster it might work.
To fit it all in to a single page with minimal room taken up by axis annotations, you could do something like:
library(tidyverse)
pivot_longer(data, -label) %>%
mutate(name = as.numeric(sub('V', '', name))) %>%
mutate(row = (name - 1) %/% 20,
label = factor(label)) %>%
ggplot(aes(factor(name), value, fill = label)) +
geom_boxplot() +
facet_wrap(row~., nrow = 5, scales = 'free_x') +
labs(x = "data frame column") +
theme(strip.background = element_blank(),
strip.text = element_blank())
But this is still far from ideal.
An alternative, depending on the nature of your data columns, would be to plot the column number as a continuous variable. That way, you can represent the distribution in each column via its density, allowing for a heatmap-type plot which might actually convey your data's story better:
pivot_longer(data, -label) %>%
mutate(x = as.numeric(sub('V', '', name))) %>%
mutate(label = factor(label)) %>%
group_by(x, label) %>%
summarize(y = density(value, from = -6, to = 6)$x,
z = density(value, from = -6, to = 6)$y) %>%
ggplot(aes(x, y, fill = label, alpha = z)) +
geom_raster() +
coord_cartesian(expand = FALSE) +
labs(x = 'data frame column', y = 'value', alpha = 'density') +
facet_grid(label~.) +
guides(fill = 'none') +
theme_bw()