Search code examples
rggplot2histogramvisualizationoverlap

How to draw a Two-group Histogram with split longitudinally bars in the overlaying part with ggplot2 in R


I want the overlapping part of a histogram to show the bar divided by a diagonal line, for example, I created the following code:

set.seed(1)
grupo1 <- round(rnorm(100, mean = 20, sd = 2.2))
grupo2 <- round(rnorm(100, mean = 10, sd = 2))

df <- data.frame(
  valores = c(grupo1, grupo2),
  grupo = c(rep("grupo1", length(grupo1)), rep("grupo2", length(grupo2)))
)

# Crear histograma
ggplot(df, aes(x = valores, fill = grupo)) +
  geom_histogram(binwidth = 1, color = "black", position = "identity", alpha=0.6) +
  labs(x = "Valores", y = "Frecuencia", fill = "Grupo") +
  scale_fill_manual(values = c("grupo1" = "blue", "grupo2" = "red")) +
  theme_minimal()

This code results in this plot: result, note that the part that overlaps is a different color

But I want the plot to be drawn like this (I modified it in paint): what I want, note that shows both colors Does anyone know how to draw the histogram as the last image shows?

A second example, following the same code:

    df2 <- rbind(df, data.frame(valores = c(15,15), grupo = c("grupo1", "grupo1")))

This is the result: 2° example But i want this (modified in paint): what i want

I have tried some 'geom_histogram' arguments such as changing the 'position' but does not work I expect a code that solves my issue, Thanks in advance


Solution

  • Ggplot2 doesn't work well with this kind of thing. This question is about adding textures to bar plots, something similar to what you want, and it comes at great difficulty.

    I've come up with some really weird solutions, there might be a way easier approach that I don't know.

    Obs: dummy data (df2) at the end.

    Option 1 - bars with combined colors, but not combined heights

    This is basically what you said you didn't wanted, but with an improvement: the combined bar doesn't have the sum of the heights. See, if we made:

    ggplot(df2, aes(valores, fill = grupo)) +
      geom_histogram(binwidth = 1, color = "black", alpha=0.6)
    

    The bar at valores = 15 will have count = 3 + 1, but we might prefer a count = 3 bar with another samller count = 1 bar below. We can get that using position_dodge(), but requesting no actual dodge:

    ggplot(df2, aes(valores, fill = grupo)) +
      geom_histogram(position = position_dodge(0), binwidth = 1, color = "black", alpha=0.6)
    

    enter image description here

    But we can also add a small amount of dogdge. Using position_dodge(0.3):

    enter image description here

    The problem with this solution is that it adds whitespace between the bars. There might be an option to geom_histogram that removes it, but I don't know. You can open a new question if you'd like to.

    Option 2 - faking bars using geom_area + geom_segment

    You can build a new dataset with the values of the histogram, then you get more flexibility to customize the bars. You could do that with hist(), but since we're using ggplot, I made individual histograms for each group, and got their data with ggplot_build(). There might be a better way to do that, the important part is that at the end you have a dataset with the histogram values for each group.

    df_area <- df2 %>%
      group_split(grupo) %>% #for each group
      map_dfr(function(df_group){ #apply the following function
        g <- ggplot(df_group, aes(valores)) +
          geom_histogram(binwidth = 1) #build a histogram
        
        ggplot_build(g)$data[[1]] %>% #get it's data
          select(c(x, xmin, xmax, y)) %>% #select these columns
          mutate(grupo = unique(df_group$grupo)) %>% #and add a 'grupo' column
          pivot_longer(c(xmin, xmax), values_to = "x_area") #pivot the data in order to build columns with geom_area
      })
    

    Now, we can build the area of the histogram with geom_area, and the lines with geom_segment. Again we use position_dodge(0), but this time, no whitespace!:

    ggplot(df_area, aes(x_area, y, fill = grupo)) +
      geom_area(position = position_dodge(0), alpha = 0.6, color = "black") +
      geom_segment(aes(y = 0, yend = y, x = x_area, xend = x_area))
    

    enter image description here

    This might yield these weird boundaries between the area contour and the segments. Also, can't use position_dodge(k) with k != 0.

    Option 3 - geom_area + geom_segment with custom data

    This is the closest to what you wanted. The ideia is to change the data:

    df_area2 <- df_area %>%
      mutate(y = case_when(grupo == "grupo1" ~ ifelse(name == "xmin", y, 0),
                           grupo == "grupo2" ~ ifelse(name == "xmax", y, 0)))
    

    In such a way to produce the inclined bars:

    enter image description here

    Then, we add that on top of the base graph of the last image:

    ggplot(df_area, aes(x_area, y, fill = grupo)) +
      geom_area(position = position_dodge(0), color = "black") +
      geom_area(data = df_area2) + #on top of the base area, but below the lines
      geom_segment(aes(y = 0, yend = y, x = x_area, xend = x_area))
    

    enter image description here

    Using alpha here will make the inclined bars visible. You can pass "washed out" colors to fill in order to match the previous tone you had.

    Lastly, we can add lines at the end of the half bars:

    binwidth <- 1
    
    ggplot(df_area, aes(x_area, y, fill = grupo)) +
      geom_area(position = position_dodge(0), color = "black") +
      geom_area(data = df_area2) +
      geom_segment(aes(y = 0, yend = y, x = x_area, xend = x_area)) +
      geom_segment(aes(y = y, yend = y, x = x - 0.5*binwidth, xend = x + 0.5*binwidth))
    

    Here, binwidth is the same one you used to create the histogram data.

    enter image description here

    Dummy data

    set.seed(1)
    grupo1 <- round(rnorm(100, mean = 20, sd = 2.2))
    grupo2 <- round(rnorm(100, mean = 10, sd = 2))
    
    df <- data.frame(valores = c(grupo1, grupo2),
                     grupo = c(rep("grupo1", length(grupo1)), rep("grupo2", length(grupo2))))
    
    df2 <- rbind(df, data.frame(valores = c(15,15), grupo = c("grupo1", "grupo1")))