Search code examples
rggplot2colorbargeom-bar

R: ggplot: how to assign a color to one variable that consistently appears across plots, and the rest random


If i have this plot, and I want to make a barchart,

df <-
    setNames(data.frame(
        as.POSIXct(
            c(
            "2022-07-29 00:00:00",
            "2022-07-29 00:00:05",
            "2022-07-29 00:05:00",
            "2022-07-29 00:05:05",
            "2022-07-29 00:10:00",
            "2022-07-29 00:15:00",
            "2022-07-29 00:20:00",
            "2022-07-29 00:20:05"
            )),
        c(1, 2, 3, 4, 5, 6, 7, 8),
        c("a", "a", "b847", "b317", "b317", "bob680", "bf456", "c3400")
    ),
    c("timeStamp", "value1", "text"))

df %>% ggplot(aes(x=fct_infreq(text), fill=text)) + geom_bar(stat='count',aes(y=(..count..)/sum(..count..)))

how is it possible to assign "a" in "text", to ALWAYS be a certain color? The trick is, I cannot just create a palette and use

scale_fill_manual(values=palette). 

The reason for this is because "a" is the only figure in "text" that consistently shows up, each time new data comes in and I need to plot it. I tried to show this with the random character/number strings that make up the rest of "text". I want to always have "a" show up as a specific color, i.e. red, then all other incoming UNKNOWN (hence why i cant use a manual palette) variables being assigned any random color respectively.

Also, bonus for making the plot always show the top 5 most frequent variables in text besides "a". I know this is possible using something like

'%!in%' <- Negate('%in%')


df$text[df$text %!in% c("b847", "b317", "b317", "bob680", "bf456", "c3400")] <- "a"

as you can see, that is only possible AFTER I have run the plot, identified the top 5 manually, and then changing the base dataframe to assign all others simply as "a". I figure there must be a better, more automated way to do this, since this is definitely a common need in geom_bar charts. But I can't seem to figure out a solution.

Thanks


Solution

  • Use relevel() or forcats::fct_relevel() so that "a" is always the first level; then provide an unnamed manual palette with red as the first value.

    library(dplyr)
    library(forcats)
    library(ggplot2)
    library(scales)
    
    df %>% 
      mutate(
        text = fct_infreq(text),
        text_fill = fct_relevel(text, "a")
      ) %>% 
      ggplot(aes(x=text, fill=text_fill)) + 
      geom_bar(aes(y=(..count..)/sum(..count..))) +
      scale_fill_manual(values = c("red", brewer_pal(palette = "Set2")(8)))
    

    Note, I created a separate text_fill so that the "a" column will still appear in the right place on the x axis if it has fewer values. If you don’t care about this and want it to always appear first, then just relevel text.