Search code examples
rggplot2fillgroupgeom-histogram

geom_histogram cluster values with same fill category together


I'm trying to create a histogram which uses one column in the data set for the fill colour and another column in the data set for the groups. Both of these are define within the aes(). Then I add a white border which goes around each group. There are more groups than fill categories.

My problem is that when I define groups, the sub-groups within the same fill category are not stacked together in the bars - they seem to be following a random order. I tried ordering the data.frame by the fill column before passing it to ggplot() but this doesn't help.

How can I display observations with the same fill category together even when they are in different sub-groups and from different dates (my x axis is dates)?

Here is some example data:

# Set seed for reproducibility:
set.seed(123)

# Set start date:
start_date <- as.Date("2024-01-01")

# Set end date:
end_date <- as.Date("2024-04-01")

# Create data.frame:
data <- data.frame(
  onset_date = sample(seq(start_date, end_date, by = "day"), 
                      100, 
                      replace = TRUE),
  category = sample(c("A", "B"), 
                    100, 
                    replace = TRUE))

# Add row names for grouping:
data$grouping <- as.numeric(row.names(data))

# Create epicurve_breaks
epicurve_breaks <- seq.Date(
  from = start_date, 
  to = end_date, 
  by = "week")

Here is the histogram without groups:

p1 <- ggplot(data, 
             aes(x = onset_date, fill = category)) +
  geom_histogram(breaks = epicurve_breaks, 
                 closed = 'left', 
                 colour = "white")

This give the following plot - - as you can see the entities from the same fill category are stacked together:

Plot without subgroups

Here is the code for the plot when I add groups:

p2 <- ggplot(data, 
             aes(x = onset_date, fill = category, group = grouping)) +
  geom_histogram(breaks = epicurve_breaks, 
                 closed = 'left', 
                 colour = "white")

Here is the plot with groupings - now the category A and B squares are no longer clustered together on the bar:

Plot with subgroups

Any advice on how I can keep the categories grouped together even when there are subgroups within categories, would be much appreciated.


Solution

  • Ordering is by factor levels. Using forcats we can e.g. do:

    ggplot(
      data, 
      aes(onset_date, fill = category, group = fct_reorder(factor(grouping), category))
    ) +
      geom_histogram(
        breaks = epicurve_breaks, 
        closed = 'left', 
        colour = "white"
      ) + 
      coord_fixed(6)
    

    (I'm not sure why my data looks different, I used your seed.)

    enter image description here