Search code examples
rggplot2frequencycategorical-data

R & ggplot2 - how to plot relative frequency of a categorical split by a binary variable


I can easily make a relative frequency plot with one 'base' category along the x-axis and the frequency of another categorical being the y:

library(ggplot2)
ggplot(diamonds) +
  aes(x = cut, fill = color) +
  geom_bar(position = "fill")

Now say I have that categorical variable split in some way by a binary variable:

diamonds <- data.frame(diamonds)
diamonds$binary_dummy <- sample(c(0,1), nrow(diamonds), replace = T)

How do I plot the original categorical but now showing the split in the colour ('color') variable. Preferably this will be represented by two different shades of the original colour.

Basically I am trying to reproduce this plot: Freq_plot_example

As you can see from the legend, each catetory is split by "NonSyn"/"Syn" and each split is coloured as a dark/light shade of another distinct colour (e.g. "regulatory proteins NonSyn" = dark pink, "regulatory proteins Syn" = light pink).


Solution

  • If you don't mind manually setting the palette you could do something like this:

    library(ggplot2)
    library(colorspace)
    
    df <- data.frame(diamonds)
    df$binary_dummy <- sample(c(0,1), nrow(df), replace = T)
    
    pal <- scales::brewer_pal(palette = "Set1")(nlevels(df$color))
    pal <- c(rbind(pal, darken(pal, amount = 0.2)))
    
    ggplot(df, aes(x = cut, fill = interaction(binary_dummy, color))) +
      geom_bar(position = "fill") +
      scale_fill_manual(values = pal)
    

    Created on 2020-04-14 by the reprex package (v0.3.0)

    EDIT: To fix interaction-color relations you can set a named palette, e.g.:

    pal <- setNames(pal, levels(interaction(df$binary_dummy, df$color)))
    
    # Miss a level
    df <- df[!(df$binary_dummy == 0 & df$color == "E"),]
    
    ggplot(df, aes(x = cut, fill = interaction(binary_dummy, color))) +
      geom_bar(position = "fill") +
      scale_fill_manual(values = pal)
    
    

    Alternatively, you can also set the breaks of the scale:

    ggplot(df, aes(x = cut, fill = interaction(binary_dummy, color))) +
      geom_bar(position = "fill") +
      scale_fill_manual(values = pal, breaks = levels(interaction(df$binary_dummy, df$color)))