Search code examples
rggplot2binning

Create custom (equally spaced) bins in ggplot for data with gaps


I have data with large degrees of separation between "clusters/groups" of values that I hope to make a histogram with, but dividing the bins into equal sized groups has been difficult. I'd like for zero (0) to have it's own bin, the total number of equally spaced bins be < 8 (ideally, to avoid crowding the plot) with an extra empty bin for "..." signifying the large gaps in-between the data values. The actual dataset has 800+ zeros with maybe 5% data >0. Naturally the zeros will over-shadow the rest of the data, but a log transform will fix that. I just can't figure out the best way to break-up the data...

Data looks like this:

set.seed(123)

zero <- runif(50, min=0, max=0)
small <- runif(7, min=0, max=0.1)
medium <- runif(5, min=0, max=0.5)
high <- runif(3, min=1.5, max=2.5)

f <- function(x){
  return(data.frame(ID=deparse(substitute(x)), value=x))
}

all <- bind_rows(f(zero), f(small), f(medium), f(high))
all <- as.data.frame(all[,-1])
names(all)[1] <- "value"

My attempt:

bins <- all %>% mutate(bin = cut(all$value, breaks = c(0, seq(0.01:0.4), Inf), right = FALSE)) %>%
  count(bin, name = "freq") %>%
  add_row(bin = "...", freq = NA_integer_) %>%
  mutate(bin = fct_relevel(bin, "...", after = 0.4))

But I get this error:

Error in `mutate()`:
! Problem while computing `bin = fct_relevel(bin, "...", after = 0.5)`.
Caused by error:
! `idx` must contain one integer for each level of `f`

This is not equally spaced, but I'm looking for something like this as labels for my plot:

levels(bins$bin) <- c("0", "0.01-0.05", "0.05-0.1", "0.1-0.2", "0.2-0.3", "0.3-0.4", "...", "2.0+")

ggplot(bins, aes(x = bin, y = freq, fill = bin)) + 
  geom_histogram(stat = "identity", colour = "black")

Solution

  • You can use cut directly inside ggplot

    ggplot(all, aes(cut(value, breaks = c(0, 0.25, 0.5, 3), inc = TRUE))) + 
      geom_bar() + 
      scale_y_log10() +
      labs(x = "value")
    

    enter image description here