ggplot2 geom_bar position failure

I am using the ..count.. transformation in geom_bar and get the warning position_stack requires non-overlapping x intervals when some of my categories have few counts.

This is best explained using some mock data (my data involves direction and windspeed and I retain names relating to that)

#make data
set.seed(12345)
FF=rweibull(100,1.7,1)*20  #mock speeds
FF[FF>60]=59
dir=sample.int(10,size=100,replace=TRUE) # mock directions

#group into speed classes
FFcut=cut(FF,breaks=seq(0,60,by=20),ordered_result=TRUE,right=FALSE,drop=FALSE)

# stuff into data frame & plot
df=data.frame(dir=dir,grp=FFcut)
ggplot(data=df,aes(x=dir,y=(..count..)/sum(..count..),fill=grp)) + geom_bar()

This works fine, and the resulting plot shows the frequency of directions grouped according to speed. It is of relevance that the velocity class with the fewest counts (here "[40,60)") will have 5 counts.

However more velocity classes leads to a warning. For instance, with

FFcut=cut(FF,breaks=seq(0,60,by=15),ordered_result=TRUE,right=FALSE,drop=FALSE)

the velocity class with the fewest counts (now "[45,60)") will have only 3 counts and ggplot2 will warn that

position_stack requires non-overlapping x intervals

and the plot will show data in this category spread out along the x axis. It seems that 5 is the minimum size for a group to have for this to work correctly.

I would appreciate knowing if this is a feature or a bug in stat_bin (which geom_bar is using) or if I am simply abusing geom_bar.

Also, any suggestions how to get around this would be appreciated.

Sincerely

Solution

This occurs because df$dir is numeric, so the ggplot object assumes a continuous x-axis, and aesthetic parameter group is based on the only known discrete variable (fill = grp).

As a result, when there simply aren't that many dir values in grp = [45,60), ggplot gets confused over how wide each bar should be. This becomes more visually obvious if we split the plot into different facets:

ggplot(data=df,
            aes(x=dir,y=(..count..)/sum(..count..),
                fill = grp)) + 
  geom_bar() + 
  facet_wrap(~ grp)

> for(l in levels(df$grp)) print(sort(unique(df$dir[df$grp == l])))
[1]  1  2  3  4  6  7  8  9 10
[1]  1  2  3  4  5  6  7  8  9 10
[1]  2  3  4  5  7  9 10
[1] 2 4 7

We can also check manually that the minimum difference between sorted df$dir values is 1 for the first three grp values, but 2 for the last one. The default bar width is thus wider.

The following solutions should all achieve the same result:

1. Explicitly specify the same bar width for all groups in geom_bar():

ggplot(data=df,
       aes(x=dir,y=(..count..)/sum(..count..),
           fill = grp)) + 
  geom_bar(width = 0.9)

2. Convert dir to a categorical variable before passing it to aes(x = ...):

ggplot(data=df,
       aes(x=factor(dir), y=(..count..)/sum(..count..),
           fill = grp)) + 
  geom_bar()

3. Specify that the group parameter should be based on both df$dir & df$grp:

ggplot(data=df,
       aes(x=dir,
           y=(..count..)/sum(..count..),
           group = interaction(dir, grp),
           fill = grp)) + 
  geom_bar()