Search code examples
rggplot2bar-chartzero

Bar chart - bars jumped to y-axis


I was plotting a bar chart with the code which worked perfectly well until some of the data had a value of 0.

barwidth = 0.35

df1:
norms_number   R2.c 
1             0.011     
2             0         
3             0.015         
4             0.011         
5             0         
6             0.012

df2:
norms_number  R2.c
1           0.001           
2           0           
3           0.012           
4           0.006           
5           0           
6           0.004

test <- ggplot()+
geom_bar(data=df1, aes(x=norms_number, y=R2.c),stat="identity", position="dodge", width = barwidth)+
  geom_bar(data=df2, aes(x=norms_number+barwidth+0.03, y=R2.c), 
stat="identity",  position="dodge",width = barwidth)

my result was:

enter image description here

and I got a warning that position stack requires non-overlapping x intervals (but they are not overlapping?)

I looked into it and changed the DV to factor (from numeric), which half helped, because now the graph looks like this:

enter image description here

why are the bars on the y axis? how else can I get around this weird error with values of 0?


Solution

  • First of all, you are intending to plot a bar chart where the heights are represented by a value rather than by number of cases. See here for more details, but you should be using geom_col instead of geom_bar.

    With that being said, the error you are getting and the result is because it seems with x=norms_number+barwidth+0.03 you are trying to specify the precise positioning of the second set of data (df2) relative to the first set of data (df1).

    In order for ggplot to dodge, it has to understand what to use as a basis for the dodge, and then it will separate (or "dodge") each observation containing the same x= aesthetic based upon that particular group used as the basis. Under normal circumstances, you would specify in aes( something like fill=, and ggplot is smart enough to know that whatever you set as fill= will also be the basis for position='dodge' to function. in the abscence of that (or if you wanted to override that), you would need to specify a group= aesthetic that would be used for dodging.

    Ultimately, this means that you need to combine your datasets and provide ggplot a way of deciding how to dodge. This makes sense, since both of your dataframes are intended to be placed in the same plot, and both have identical x and y aesthetics. If you leave them as separate dataframes, you can overplot them in the same plot, but there is no good way to have ggplot use position='dodge', because it needs to see all the data in the geom_col call in order to know what to use as the basis for the dodge.

    With all that being said, here's what I would recommend:

    # combine datasets, but first make a marker called "origin"
    # this will be used as a basis for the dodge and fill aesthetics
    df1$origin <- 'df1'
    df2$origin <- 'df2'
    df <- rbind(df1, df2)
    
    # need to change norms_number to a factor to allow for discrete axis
    df$norms_number <- as.factor(df$norms_number)
    

    You then use only one call to geom_col to get your plot. In the first case, I will use only the group= aesthetic to show you how ggplot uses this for the dodge mechanism:

    ggplot(df, aes(x=norms_number, y=R2.c)) +
      geom_col(position='dodge', width=0.35, aes(group=origin), color='black')
    

    enter image description here

    As mentioned, you can also just supply a fill= aesthetic, and ggplot will know to use that as the mechanism for dodging:

    ggplot(df, aes(x=norms_number, y=R2.c)) +
      geom_col(position='dodge', width=0.35, aes(fill=origin), color='black')
    

    enter image description here