Search code examples
rggplot2

geom_area stacks areas by default


I am using geom_area to plot a very simple dataset. When plotting using geom_line everything is normal but when I switch to geom_area higher values getting plotted. I think looking at the graphs would be the best way of representing my problem:

require(tidyverse)

x <- structure(list(Time = 0:40, X15.DCIA = c(0, 1, 0.5, 0, 2, 2.5, 
      1, 0.5, 0, 1, 1.5, 1, 0.5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.5, 3, 
      5, 7, 6.5, 5.5, 4, 3, 2, 1.5, 1, 0.25, 0, 0, 0, 0, 0, 0, 0), 
      X100.DCIA = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
      0, 0, 0, 0, 0, 0, 0, 0, 1.5, 7, 8, 7.5, 6.5, 5, 3.5, 2.25, 
      1.75, 1.1, 0.4, 0.1, 0, 0, 0, 0, 0, 0)),
      class = "data.frame", row.names = c(NA,-41L))

 x %>% gather(prct.DCIA, Vol, -Time) %>% ggplot(aes(x=Time, y=Vol)) +
  geom_line(aes(color=prct.DCIA))


 x %>% gather(prct.DCIA, Vol, -Time) %>% ggplot(aes(x=Time, y=Vol)) +
  geom_area(aes(fill=prct.DCIA))

plotst

The geom_line is what I expected (a line plot of my data).

But then looking at the geom_area you see that 100DCIA has jumped up-to 15.

I am more interested in an explanation rather than a fix or workaround.

Note:

This can be a workaround:

x %>% gather(prct.DCIA, Vol, -Time) %>% ggplot(aes(x=Time, y=Vol)) + 
      geom_polygon(aes(fill=prct.DCIA, alpha=0.5)) + guides(alpha=FALSE)

Solution

  • Explanation: Your plots are stacking on top of one another.

    The values you see following the red line in the geom_area graph are the sum of the values for the red and blue lines in your geom_line graph.

    You can see this clearly if you separate out prct.DCIA with facet_grid():

    x %>% gather(prct.DCIA, Vol, -Time) %>% ggplot(aes(x=Time, y=Vol)) +
      geom_area(aes(fill=prct.DCIA)) + facet_grid(.~prct.DCIA)
    

    enter image description here

    This is simply because position = "stack" is the default argument in geom_area:

    geom_area(mapping = NULL, data = NULL, stat = "identity",
      position = "stack", na.rm = FALSE, show.legend = NA,
      inherit.aes = TRUE, ...)
    

    One might presume this is because people use geom_area because they want to show the whole area on a diagram, rather than fill under some lines. Generally bars or area might represent a count of something, or the area filled in represents something, while points or lines may represent a point estimate and the area above or below the line or point isn't meaningful.

    Cf. the default argument for geom_line is position = "identity".

    geom_line(mapping = NULL, data = NULL, stat = "identity",
      position = "identity", na.rm = FALSE, show.legend = NA,
      inherit.aes = TRUE, ...)
    

    Fix: If you use position = position_dodge() you can see they return to looking like the line graph, with the red area is plotted behind the blue area:

      x %>% gather(prct.DCIA, Vol, -Time) %>% ggplot(aes(x=Time, y=Vol)) +
      geom_area(aes(fill=prct.DCIA), position = position_dodge())
    

    enter image description here

    You can even set alpha < 1 and see this clearly:

    x %>% gather(prct.DCIA, Vol, -Time) %>% ggplot(aes(x=Time, y=Vol)) +
      geom_area(aes(fill=prct.DCIA), position = position_dodge(), alpha = 0.5)
    

    enter image description here