Search code examples
rggplot2histogramstacked

ggplot2 geom_bar position failure


I am using the ..count.. transformation in geom_bar and get the warning position_stack requires non-overlapping x intervals when some of my categories have few counts.

This is best explained using some mock data (my data involves direction and windspeed and I retain names relating to that)

#make data
set.seed(12345)
FF=rweibull(100,1.7,1)*20  #mock speeds
FF[FF>60]=59
dir=sample.int(10,size=100,replace=TRUE) # mock directions

#group into speed classes
FFcut=cut(FF,breaks=seq(0,60,by=20),ordered_result=TRUE,right=FALSE,drop=FALSE)

# stuff into data frame & plot
df=data.frame(dir=dir,grp=FFcut)
ggplot(data=df,aes(x=dir,y=(..count..)/sum(..count..),fill=grp)) + geom_bar()

This works fine, and the resulting plot shows the frequency of directions grouped according to speed. It is of relevance that the velocity class with the fewest counts (here "[40,60)") will have 5 counts. Three categories of size 20 each

However more velocity classes leads to a warning. For instance, with

FFcut=cut(FF,breaks=seq(0,60,by=15),ordered_result=TRUE,right=FALSE,drop=FALSE)
 

the velocity class with the fewest counts (now "[45,60)") will have only 3 counts and ggplot2 will warn that

position_stack requires non-overlapping x intervals

and the plot will show data in this category spread out along the x axis. Four categories of size 15 each. Now the last one with three elements is not added on top of the corresponding bar It seems that 5 is the minimum size for a group to have for this to work correctly.

I would appreciate knowing if this is a feature or a bug in stat_bin (which geom_bar is using) or if I am simply abusing geom_bar.

Also, any suggestions how to get around this would be appreciated.

Sincerely


Solution

  • This occurs because df$dir is numeric, so the ggplot object assumes a continuous x-axis, and aesthetic parameter group is based on the only known discrete variable (fill = grp).

    As a result, when there simply aren't that many dir values in grp = [45,60), ggplot gets confused over how wide each bar should be. This becomes more visually obvious if we split the plot into different facets:

    ggplot(data=df,
                aes(x=dir,y=(..count..)/sum(..count..),
                    fill = grp)) + 
      geom_bar() + 
      facet_wrap(~ grp)
    

    facet view

    > for(l in levels(df$grp)) print(sort(unique(df$dir[df$grp == l])))
    [1]  1  2  3  4  6  7  8  9 10
    [1]  1  2  3  4  5  6  7  8  9 10
    [1]  2  3  4  5  7  9 10
    [1] 2 4 7
    

    We can also check manually that the minimum difference between sorted df$dir values is 1 for the first three grp values, but 2 for the last one. The default bar width is thus wider.

    The following solutions should all achieve the same result:

    1. Explicitly specify the same bar width for all groups in geom_bar():

    ggplot(data=df,
           aes(x=dir,y=(..count..)/sum(..count..),
               fill = grp)) + 
      geom_bar(width = 0.9)
    

    2. Convert dir to a categorical variable before passing it to aes(x = ...):

    ggplot(data=df,
           aes(x=factor(dir), y=(..count..)/sum(..count..),
               fill = grp)) + 
      geom_bar()
    

    3. Specify that the group parameter should be based on both df$dir & df$grp:

    ggplot(data=df,
           aes(x=dir,
               y=(..count..)/sum(..count..),
               group = interaction(dir, grp),
               fill = grp)) + 
      geom_bar()
    

    plot