Search code examples
rggplot2area

Stacking means of multiple variables in an area plot


I would like to make an area plot showing the mean evolution of three variables (Vr, Hr and Ar) across a dilution series (x = dil) with 7 dilution steps. Each dilution has 5 replicates which I would like to summarise as a mean. The three variables represent proportions of the variable M (Vr + Hr + Ar = M). So I don't want to add up proportions to 100%, but stack the respective areas of each variable to reach M on top (added as a line).

How can I stack the means of the three variables in an area plot to show the proportional distribution at each dilution step? I tried this so far, but the layers don't stack:

mline <- ggplot(data = data, aes(x = dil), na.action=na.omit) +
  stat_summary(aes(y = M, group = 1), fun = mean,
               geom ="line") +
  stat_summary(aes(y = Hr, group = 1), fun= mean,
               geom ="area", position = "stack") +
  stat_summary(aes(y = Ar, group = 1), fun= mean,
               geom ="area", position = "stack") +
  stat_summary(aes(y = Vr, group = 1), fun= mean,
               geom ="area", position = "stack")
mline

My dataframe for this example looks like this:

dput(data)
structure(list(dil = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 
5L, 5L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L), .Label = c("1", 
"2", "3", "4", "5", "6", "7"), class = "factor"), M = c(0.366666667, 
12.03333333, 1, 6.933333333, 4.533333333, 2.166666667, 5.633333333, 
1, 1.4, 0, 15.66666667, 21.16666667, 6.033333333, 1, 0.2, 0, 
4.533333333, 0.333333333, 0.166666667, 0, 0, 1, 0, 0.366666667, 
0, 0.166666667, 0, 0, 0, NA, 1, 3.5, 0, NA, NA), Ar = c(0.100284295, 
3.896431897, 0.333333333, 2.241353469, 1.540488607, 0.196969697, 
2.118578371, 0.095357674, 0.200607926, 0, 3.605257275, 6.81946709, 
0.930970496, 0.393446629, 0.03012711, 0, 0.468426671, 0.031017502, 
0.065574438, 0, 0, 0.333333333, 0, 0.142139889, 0, 0.015151515, 
0, 0, 0, NA, 0.090909091, 1.240533311, 0, NA, NA), Vr = c(0.010505974, 
0.46853597, 0.333333333, 0.977669123, 0.43271556, 0.196969697, 
0.749485112, 0, 0.051063836, 0, 3.262519219, 2.859413375, 0.641593028, 
0.078689326, 0.009038133, 0, 0.637060272, 0.015508751, 0.013114888, 
0, 0, 0.333333333, 0, 0, 0, 0.015151515, 0, 0, 0, NA, 0.090909091, 
0.827022207, 0, NA, NA), Hr = c(0.255876398, 7.668365466, 0.333333333, 
3.714310741, 2.560129166, 1.772727273, 2.765269851, 0.904642326, 
1.148328239, 0, 8.798890173, 11.4877862, 4.460769809, 0.527864045, 
0.160834757, 0, 3.427846391, 0.286807081, 0.087977341, 0, 0, 
0.333333333, 0, 0.224526778, 0, 0.136363636, 0, 0, 0, NA, 0.818181818, 
1.432444482, 0, NA, NA)), row.names = c(NA, -35L), class = "data.frame")

Probably it's simple, but I don't get it. Thanks a lot!


Solution

  • You have basically two options: (1) record your summary stats in whatever way you wish into another dataframe and then plot with that one, or (2) use stat_summary in ggplot2 to do all that for you. I'm going with option #2, which is much easier in this case.

    The first step is to transform your dataset into one that respects Tidy Data Principles. Looks like you have one y column (dil), one total column (M) and three columns of data for the three variables Vr, Hr, and Ar. In this case, you can leave M and dil alone, but I'm going to use gather() from tidyr to transform your 3 columns into 2 columns: one for the variable name and one for the value:

    library(ggplot2)
    library(dplyr)
    library(tidyr)
    
    # where df is your data frame
    df1 <- df %>%
      gather(key='var', value='value', -c(dil, M))
    

    Then, you can plot using the stat_summary commands, but note that you should group= by the var column.

    m <- ggplot(df1, aes(x=dil, group=var)) +
      stat_summary(aes(y=M), geom='line', fun=mean) +
      stat_summary(aes(y=value, fill=var), alpha=0.5,
         geom='area', position='stack', fun=mean)
    m
    

    enter image description here