Search code examples
rggplot2zoominggroupinghistogram

Grouping data outside limits in histogram using ggplot2


I am trying to do a histogram zoomed on part of the data. My problem is that I would like to grup everything that is outside the range into last category "10+". Is it possible to do it using ggplot2?

Sample code:

x <- data.frame(runif(10000, 0, 15))
ggplot(x, aes(runif.10000..0..15.)) + 
  geom_histogram(aes(y =  (..count..)/sum(..count..)), colour = "grey50", binwidth = 1) + 
  scale_y_continuous(labels = percent) +
  coord_cartesian(xlim=c(0, 10)) +
  scale_x_continuous(breaks = 0:10) 

Here is how the histogram looks now: How the histogram looks now

And here is how I would like it to look: How the histogram should look

Probably it is possibile to do it by nesting ifelses, but as I have in my problem more cases is there a way for ggplot to do it?


Solution

  • You could use forcats and dplyr to efficiently categorize the values, aggregate the last "levels" and then compute the percentages before the plot. Something like this should work:

    library(forcats)
    library(dplyr)
    library(ggplot2)
    
    x <- data.frame(x = runif(10000, 0, 15))
    x2 <- x %>%
      mutate(x_grp = cut(x, breaks = c(seq(0,15,1)))) %>% 
      mutate(x_grp = fct_collapse(x_grp, other = levels(x_grp)[10:15])) %>% 
      group_by(x_grp) %>% 
      dplyr::summarize(count = n())
    
    ggplot(x2, aes(x = x_grp, y = count/10000)) + 
      geom_bar(stat = "identity", colour = "grey50") + 
      scale_y_continuous(labels = percent) 
    

    However, the resulting graph is very different from your example, but I think it's correct, since we are building a uniform distribution:

    enter image description here