Search code examples
rggplot2tidyversegeom-baraesthetics

Why is my R bar chart (geom_bar) not appropriately filling color based on the provided variable?


This is a bit of a newbie question. I am using the package "nycflights13" in R, and "tidyverse".

library(nycflights13)  
library(tidyverse)

I am trying to get a bar chart that shows the total number of flights by airline/carrier, and have it color each bar by the number of flights that occurred each month.

I can get a simple bar chart to show with the following:

ggplot(flights) +  
    geom_bar(mapping=aes(x=carrier))

When I try to color it with the month, it doesn't change anything.

ggplot(flights) +  
    geom_bar(mapping=aes(x=carrier, fill=month))

The graph generated by the code above looks exactly the same.

It seems to work when I do the opposite... if I create a chart with "month" on the x-axis and color by carrier, it works just like I would expect.

ggplot(flights) +  
    geom_bar(mapping=aes(x=month,fill=carrier))

I assume it has something to do with discrete vs continuous variables?


Solution

  • Yes, this has to do with discrete vs continuous variables. as.factor() will convert month to discrete factors.

    ggplot(flights) + 
        geom_bar(mapping=aes(x=carrier, fill=as.factor(month))) 
    

    For fun, there is a way to override geom_bar's built in stat_count default. This requires adding a dummy variable to flights, to use as a y, and sorting the data by month (or you get weird artifacts). Look at the help document about ?geom_bar().

    flights$n<-1
    
    flights%>%
      arrange(month)%>%
      ggplot(aes(carrier, n, fill = month)) +
      geom_bar(stat = "identity") +
      scale_fill_continuous(low="blue", high="red")