Search code examples
rggplot2histogramoverlap

histograms starting strictly at the minimal value of the dataset and it must end strictly at the maximal value of dataset


Using ggplot2 in R i want to plot histogram histograms starting strictly at the minimal value of the dataset and it must end strictly at the maximal value of dataset.

When adding vertical lines on minimums and maximums, bins of histogram are overlapping that values. I have tried to shrink bins, or to change their quantity, and also to reduce space between them. But nothing helped.

bins = 5
bwidth =  (max(data$deltaQ)-min(data$deltaQ))/bins
ggplot(data=data ) +
  geom_histogram(
    mapping=aes(x=data$deltaQ)
    , binwidth = bwidth 
    , na.rm = TRUE
    , fill = "yellow"
    , color = "black" 
    , position="stack"   #identity, dodge, stacked
    , boundary=0
  )+
  geom_vline(xintercept = min(data$deltaQ) , color = "green" , na.rm = TRUE, mapping=aes(size=5)  )+
  geom_vline(xintercept = max(data$deltaQ) , color = "green" , na.rm = TRUE, mapping=aes(size=5))+
  geom_vline(mapping=aes(size=5)  , xintercept = min(data$deltaQMin) , color = "red" , na.rm = TRUE, linetype = "longdash")+
  geom_vline(mapping=aes(size=5)  , xintercept = max(data$deltaQMin) , color = "red" , na.rm = TRUE, linetype = "longdash")+
  geom_vline(mapping=aes(size=5)  , xintercept = max(data$deltaQMax) , color = "red" , na.rm = TRUE, linetype = "longdash")+
  geom_vline(mapping=aes(size=5)  , xintercept = min(data$deltaQMax) , color = "red" , na.rm = TRUE, linetype = "longdash")+
  xlim(-50,50)

Current hist() or geom_histogram have bin center in minimum and maximum which causes overlapping. I need to exclude possibility of bin crossing the minimal or maximal value.


Solution

  • Try to set your boundary argument to the min() or max() of the data in your call to geom_histogram.

    Using the diamonds dataset from ggplot2, you can see that setting the boundary to min(diamonds$carat) gives you boundaries at the minimum and maximum values of the data. max(diamonds$carat) does the same.

    library(tidyverse)
    
    data(diamonds)
    diamonds <- filter(diamonds, carat <= 1)
    
    ggplot(diamonds, aes(x = carat)) +
      geom_histogram(boundary = min(diamonds$carat)) +
      geom_vline(aes(xintercept = min(carat)), color = 'red') +
      geom_vline(aes(xintercept = max(carat)), color = 'red')
    

    enter image description here