Search code examples
rggplot2referencegriddensity-plot

grid of density plots with reference data plotted in each group


Let's say I have a data frame:

df = data.frame(var = c("a", "a", "b", "b", "c", "c", "a", "a", "b", "b", "c", "c", "a", "a", "b",  "b", "c", "c"),
                source = c("ref", "ref", "ref", "ref", "ref", "ref", "source1", "source1", "source1", "source1", "source1", "source1", "source2", "source2", "source2", "source2", "source2", "source2"),
                value = c(2.5, 1, 3.5, 1.6, 2.2, 3.1, 2, 1.2, 1.8, 0.4, 1.4, 1.3, 3, 2.8, 4, 3.6, 2.9, 3.8))

> df
   var  source value
1    a     ref   2.5
2    a     ref   1.0
3    b     ref   3.5
4    b     ref   1.6
5    c     ref   2.2
6    c     ref   3.1
7    a source1   2.0
8    a source1   1.2
9    b source1   1.8
10   b source1   0.4
11   c source1   1.4
12   c source1   1.3
13   a source2   3.0
14   a source2   2.8
15   b source2   4.0
16   b source2   3.6
17   c source2   2.9
18   c source2   3.8

and I would like to generate density plots for value for each var / source pair. That works with:

library(tidyverse)
library(ggplot2)

df %>%
  ggplot(aes(x = value)) +
  geom_density(aes(y = ..density.., fill = source), adjust = 1, alpha = 0.5) +
  facet_grid(source ~ var, scales = "fixed") +
  theme_bw()

producing: enter image description here

But what I really want, based on this example, is to have only two rows, corresponding to source1 and source2 and add another density curve in each of the plots based on the values from ref.

I tried to find a solution following this post but I did not succeed. In other words, I would like that for each plot in the grid to have the distribution of the values in ref as reference, and the ref group to not be taken into account in the plot legend.

Any help is highly appreciated. Thank you.


Solution

  • One option would be to split your dataframe in two, one containing the reference values, one containing the others. For the df containing the reference values we also have to drop the source column. Then make use of two geom_density. Removing the reference from the legend is not a big deal. Simply remove the fill aes and set your desired fill color if any as a parameter. In my code below I have simply set fill=NA.

    library(ggplot2)
    
    df1 <- df[df$source == "ref", -2]
    df2 <- df[!df$source == "ref", ]
    
    ggplot(mapping = aes(x = value)) +
      geom_density(data = df1, aes(y = ..density..), fill = NA, adjust = 1, alpha = 0.5) +
      geom_density(data = df2, aes(y = ..density.., fill = source), adjust = 1, alpha = 0.5) +
      facet_grid(source ~ var, scales = "fixed") +
      theme_bw()