I am trying to do a histogram zoomed on part of the data. My problem is that I would like to grup everything that is outside the range into last category "10+". Is it possible to do it using ggplot2?
Sample code:
x <- data.frame(runif(10000, 0, 15))
ggplot(x, aes(runif.10000..0..15.)) +
geom_histogram(aes(y = (..count..)/sum(..count..)), colour = "grey50", binwidth = 1) +
scale_y_continuous(labels = percent) +
coord_cartesian(xlim=c(0, 10)) +
scale_x_continuous(breaks = 0:10)
Here is how the histogram looks now: How the histogram looks now
And here is how I would like it to look: How the histogram should look
Probably it is possibile to do it by nesting ifelses, but as I have in my problem more cases is there a way for ggplot to do it?
You could use forcats
and dplyr
to efficiently categorize the values, aggregate the last "levels" and then compute the percentages before the plot. Something like this should work:
library(forcats)
library(dplyr)
library(ggplot2)
x <- data.frame(x = runif(10000, 0, 15))
x2 <- x %>%
mutate(x_grp = cut(x, breaks = c(seq(0,15,1)))) %>%
mutate(x_grp = fct_collapse(x_grp, other = levels(x_grp)[10:15])) %>%
group_by(x_grp) %>%
dplyr::summarize(count = n())
ggplot(x2, aes(x = x_grp, y = count/10000)) +
geom_bar(stat = "identity", colour = "grey50") +
scale_y_continuous(labels = percent)
However, the resulting graph is very different from your example, but I think it's correct, since we are building a uniform distribution: