Search code examples
rggplot2linesgeom-barerrorbar

Overlaying whiskers or error-bar-esque lines on a ggplot


I am creating plots similar to the first example image below, and need plots like the second example below.

library(ggplot2)
library(scales)

# some data
data.2015 = data.frame(score = c(-50,20,15,-40,-10,60),
                       area = c("first","second","third","first","second","third"),
                       group = c("Findings","Findings","Findings","Benchmark","Benchmark","Benchmark"))

data.2014 = data.frame(score = c(-30,40,-15),
                       area = c("first","second","third"),
                       group = c("Findings","Findings","Findings"))

# breaks and limits
breaks.major = c(-60,-40,-22.5,-10, 0,10, 22.5, 40, 60)
breaks.minor = c(-50,-30,-15,-5,0, 5, 15,30,50) 
limits =c(-70,70)

# plot 2015 data
ggplot(data.2015, aes(x = area, y = score, fill = group)) +
  geom_bar(stat = "identity", position = position_dodge(width = 0.9)) +
  coord_flip() +
  scale_y_continuous(limit = limits, oob = squish, minor_breaks = breaks.minor, breaks = breaks.major)

sample plot

The data.2014 has only values for the "Findings" group. I would like to show those 2014 Findings values on the plot, on the appropriate/corresponding data.2015$area, where there is 2014 data available.

To show last year's data just on the "Finding" (red bars) data, I'd like to use a one-sided errorbar/whisker that emanates from the value of the relevant data.2015 bar, and terminates at the data.2014 value, for example:

ideal plot

I thought to do this by using layers and plotting error bars so that the 2015 data could overlap, however this doesn't work when the 2014 result is abs() smaller than the 2015 result and is thus occluded.

Considerations:

  • I'd like the errorbar/whisker to be the same width as the bars, perhaps even dashed line with a solid cap.
  • Bonus points for a red line when the value has decreased, and green when the value has increased
  • I generate lots of these plots in a loop, sometimes with many groups, with a different amount of areas in each plot. The 2014 data is (at this stage) always displayed only for a single group, and every area has some data (except for just one NA case, but need to provision for that scenario)

EDIT

So I've added to the below solution, I used that exact code but instead used the geom_linerange so that it would add lines without the caps, then I also used the geom_errorbar, but with ymin and ymax set to the same value, so that the result is a one-sided error bar in ggplot geom_bar! Thanks for the help.


Solution

  • I believe you can get most of what you want with a little data manipulation. Doing an outer join of the two datasets will let you add the error bars with the appropriate dodging.

    alldat = merge(data.2015, data.2014, all = TRUE, by = c("area", "group"), 
                suffixes = c(".2015", ".2014"))
    

    To make the error bar one-sided, you'll want ymin to be either the same as y or NA depending on the group. It seemed easiest to make a new variable, which I called plotscore, to achieve this.

    alldat$plotscore = with(alldat, ifelse(is.na(score.2014), NA, score.2015))
    

    The last thing I did is to make a variable direction for when the 2015 score decreased vs increased compared to 2014. I included a third category for the Benchmark group as filler because I ran into some issues with the dodging without it.

    alldat$direction = with(alldat, ifelse(score.2015 < score.2014, "dec", "inc"))
    alldat$direction[is.na(alldat$score.2014)] = "absent"
    

    The dataset used for plotting would look like this:

        area     group score.2015 score.2014 plotscore direction
    1  first Benchmark        -40         NA        NA    absent
    2  first  Findings        -50        -30       -50       dec
    3 second Benchmark        -10         NA        NA    absent
    4 second  Findings         20         40        20       dec
    5  third Benchmark         60         NA        NA    absent
    6  third  Findings         15        -15        15       inc
    

    The final code I used looked like this:

    ggplot(alldat, aes(x = area, y = score.2015, fill = group)) +
        geom_bar(stat = "identity", position = position_dodge(width = 0.9)) +
        geom_errorbar(aes(ymin = plotscore, ymax = score.2014, color = direction), 
                    position = position_dodge(width = .9), lwd = 1.5, show.legend = FALSE) +
        coord_flip() +
        scale_y_continuous(limit = limits, oob = squish, minor_breaks = breaks.minor, breaks = breaks.major) +
        scale_color_manual(values = c(NA, "red", "green"))
    

    enter image description here

    I'm using the development version of ggplot2, ggplot2_1.0.1.9002, and show_guide is now deprecated in favor of show.legend, which I used in geom_errorbar.

    I obviously didn't change the line type of the error bars to dashed with a solid cap, nor did I remove the bottom whisker as I don't know an easy way to do either of these things.