I am trying to generate a grouped boxplot in ggplot2 with two x variables. This is straight-forward with
ggplot(boxplot_classes, aes(x=Group, y=Value, fill=Mutation)) +
geom_boxplot(position=position_dodge(0.8))
However, I do not need to compare the two subgroups defined by the second x-variable, but for each group defined by the first x-variable, I need to compare all samples in this group with one single subgroup from the second x variable.
Here an example. The data looks like this:
Value Mutation Group
32.00 Yes 1
5.00 no 1
18.00 no 1
3.00 no 1
16.00 no 1
14.00 Yes 1
28.00 Yes 1
28.00 Yes 1
49.00 Yes 1
15.00 Yes 1
43.00 no 2
49.00 Yes 2
40.00 Yes 2
17.00 Yes 2
9.00 no 2
31.00 Yes 2
8.00 Yes 2
43.00 no 2
50.00 Yes 2
48.00 Yes 2
11.00 Yes 3
42.00 no 3
0.00 Yes 3
15.00 Yes 3
8.00 no 3
1.00 Yes 3
41.00 no 3
15.00 no 3
4.00 no 3
31.00 Yes 3
I would like to generate a figure, were in each "Group" (in the example above: 1, 2, 3) two boxplots are generated: one for all samples in this "Group" and one only for those samples in this group, which also have mutation=="Yes". In the real data, many more "Groups are present".
I hope I could explain my problem well. Unfortunately I am somehow missing what the correct syntax is or how the data has to be rearranged.
Thank you very much for any help!
EDIT: I uploaded an example of the figure I am trying to generate at https://s28.postimg.org/hvq8pb25p/Folie1.jpg
If we play with your data a bit, we can do it. Suppose your data is in dat
:
dat_yes <- dat[dat$Mutation == 'Yes',] #subset only Yes
dat_yes$Mutation_2 <- 'Yes' #add column
dat$Mutation_2 <- 'All' #add column
dat_full <- rbind(dat, dat_yes) #put together
#plot
ggplot(dat_full, aes(x = factor(Group), y = Value))+
geom_boxplot(aes(fill = Mutation_2))+
xlab('Group') +
scale_fill_brewer(palette = 'Set1', name = 'Mutation')
First, we create a subset of your data called dat_yes
, which only contains the rows with Mutation == 'Yes'
. We then create a new column in dat_yes
called Mutation_2
which takes the value of 'Yes'
only. We then add a column to your original data called Mutation_2
which only takes the value of 'All'
. Then, we rbind
dat
and dat_yes
to create dat_full
. Finally, we send dat_full
to ggplot
.
dat <- structure(list(Value = c(32, 5, 18, 3, 16, 14, 28, 28, 49, 15,
43, 49, 40, 17, 9, 31, 8, 43, 50, 48, 11, 42, 0, 15, 8, 1, 41,
15, 4, 31), Mutation = c("Yes", "no", "no", "no", "no", "Yes",
"Yes", "Yes", "Yes", "Yes", "no", "Yes", "Yes", "Yes", "no",
"Yes", "Yes", "no", "Yes", "Yes", "Yes", "no", "Yes", "Yes",
"no", "Yes", "no", "no", "no", "Yes"), Group = c(1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L)), .Names = c("Value",
"Mutation", "Group"), class = "data.frame", row.names = c(NA,
-30L))