Using ggplot
and facet_grid
, I'd like to visualize two parallel vector of values through a box plot. My available data:
DF <- data.frame("value" = runif(50, 0, 1),
"value2" = runif(50,0,1),
"type1" = c(rep("AAAAAAAAAAAAAAAAAAAAAA", 25),
rep("BBBBBBBBBBBBBBBBB", 25)),
"type2" = rep(c("c", "d"), 25),
"number" = rep(2:6, 10))
The code at the moment permit to visualize only one vector of values:
ggplot(DF, aes(y=value, x=type1)) +
geom_boxplot(alpha=.3, aes(fill = type1)) +
ggtitle("TITLE") +
facet_grid(type2 ~ number) +
scale_x_discrete(name = NULL, breaks = NULL) + # these lines are optional
theme(legend.position = "bottom")
This is my plot at the moment.
I'd like to visualize a parallel box plot one for each vector (value and value2 in dataframe). Then for each colored boxplot, I'd like to have two boxplot one for value and another one for value2
I think there's likely a post that already addresses it, in addition to the one I linked to above. But this is a problem of two things: 1) getting data into the format that ggplot
expects, i.e. long-shaped so there are values to map onto aesthetics, and 2) separation of concerns, in that you can use reshape2
or (more up-to-date) tidyr
functions to get data into the proper shape, and ggplot2
functions to plot it.
You can use tidyr::gather
for getting long data, and conveniently pipe it directly into ggplot
.
library(tidyverse)
...
To illustrate, though with very generic column names:
DF %>%
gather(key, value = val, value, value2) %>%
head()
#> type1 type2 number key val
#> 1 AAAAAAAAAAAAAAAAAAAAAA c 2 value 0.5075600
#> 2 AAAAAAAAAAAAAAAAAAAAAA d 3 value 0.6472347
#> 3 AAAAAAAAAAAAAAAAAAAAAA c 4 value 0.7543778
#> 4 AAAAAAAAAAAAAAAAAAAAAA d 5 value 0.7215786
#> 5 AAAAAAAAAAAAAAAAAAAAAA c 6 value 0.1529630
#> 6 AAAAAAAAAAAAAAAAAAAAAA d 2 value 0.8779413
Pipe that directly into ggplot
:
DF %>%
gather(key, value = val, value, value2) %>%
ggplot(aes(x = key, y = val, fill = type1)) +
geom_boxplot() +
facet_grid(type2 ~ number) +
theme(legend.position = "bottom")
Again, because of some of the generic column names, I'm not entirely sure this is the setup you want—like I don't know the difference in value
/ value2
vs AAAAAAA
/ BBBBBBB
. You might need to swap aes
assignments around accordingly.