I would like to make an area plot showing the mean evolution of three variables (Vr, Hr and Ar) across a dilution series (x = dil) with 7 dilution steps. Each dilution has 5 replicates which I would like to summarise as a mean. The three variables represent proportions of the variable M (Vr + Hr + Ar = M). So I don't want to add up proportions to 100%, but stack the respective areas of each variable to reach M on top (added as a line).
How can I stack the means of the three variables in an area plot to show the proportional distribution at each dilution step? I tried this so far, but the layers don't stack:
mline <- ggplot(data = data, aes(x = dil), na.action=na.omit) +
stat_summary(aes(y = M, group = 1), fun = mean,
geom ="line") +
stat_summary(aes(y = Hr, group = 1), fun= mean,
geom ="area", position = "stack") +
stat_summary(aes(y = Ar, group = 1), fun= mean,
geom ="area", position = "stack") +
stat_summary(aes(y = Vr, group = 1), fun= mean,
geom ="area", position = "stack")
mline
My dataframe for this example looks like this:
dput(data)
structure(list(dil = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L,
5L, 5L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L), .Label = c("1",
"2", "3", "4", "5", "6", "7"), class = "factor"), M = c(0.366666667,
12.03333333, 1, 6.933333333, 4.533333333, 2.166666667, 5.633333333,
1, 1.4, 0, 15.66666667, 21.16666667, 6.033333333, 1, 0.2, 0,
4.533333333, 0.333333333, 0.166666667, 0, 0, 1, 0, 0.366666667,
0, 0.166666667, 0, 0, 0, NA, 1, 3.5, 0, NA, NA), Ar = c(0.100284295,
3.896431897, 0.333333333, 2.241353469, 1.540488607, 0.196969697,
2.118578371, 0.095357674, 0.200607926, 0, 3.605257275, 6.81946709,
0.930970496, 0.393446629, 0.03012711, 0, 0.468426671, 0.031017502,
0.065574438, 0, 0, 0.333333333, 0, 0.142139889, 0, 0.015151515,
0, 0, 0, NA, 0.090909091, 1.240533311, 0, NA, NA), Vr = c(0.010505974,
0.46853597, 0.333333333, 0.977669123, 0.43271556, 0.196969697,
0.749485112, 0, 0.051063836, 0, 3.262519219, 2.859413375, 0.641593028,
0.078689326, 0.009038133, 0, 0.637060272, 0.015508751, 0.013114888,
0, 0, 0.333333333, 0, 0, 0, 0.015151515, 0, 0, 0, NA, 0.090909091,
0.827022207, 0, NA, NA), Hr = c(0.255876398, 7.668365466, 0.333333333,
3.714310741, 2.560129166, 1.772727273, 2.765269851, 0.904642326,
1.148328239, 0, 8.798890173, 11.4877862, 4.460769809, 0.527864045,
0.160834757, 0, 3.427846391, 0.286807081, 0.087977341, 0, 0,
0.333333333, 0, 0.224526778, 0, 0.136363636, 0, 0, 0, NA, 0.818181818,
1.432444482, 0, NA, NA)), row.names = c(NA, -35L), class = "data.frame")
Probably it's simple, but I don't get it. Thanks a lot!
You have basically two options: (1) record your summary stats in whatever way you wish into another dataframe and then plot with that one, or (2) use stat_summary
in ggplot2
to do all that for you. I'm going with option #2, which is much easier in this case.
The first step is to transform your dataset into one that respects Tidy Data Principles. Looks like you have one y column (dil
), one total column (M
) and three columns of data for the three variables Vr
, Hr
, and Ar
. In this case, you can leave M
and dil
alone, but I'm going to use gather()
from tidyr
to transform your 3 columns into 2 columns: one for the variable name and one for the value:
library(ggplot2)
library(dplyr)
library(tidyr)
# where df is your data frame
df1 <- df %>%
gather(key='var', value='value', -c(dil, M))
Then, you can plot using the stat_summary
commands, but note that you should group=
by the var
column.
m <- ggplot(df1, aes(x=dil, group=var)) +
stat_summary(aes(y=M), geom='line', fun=mean) +
stat_summary(aes(y=value, fill=var), alpha=0.5,
geom='area', position='stack', fun=mean)
m