Search code examples
rggplot2stacked-area-chart

How can I get my area plot to stack using ggplot?


I am trying to get my cumulative area plot to stack using the code below, which is based on http://dantalus.github.io/2015/08/16/step-plots/. I have added in position=stack, however the plot still overlaps.

The aim of what I am trying to achieve is to show the cumulative number of publications each year within a given period. So, as an example, in 1940 there may be one publication, the following year there may be 2 more, bringing the cumulative total to 3.

What would be the best way to get the areas to stack on top of each other?

How can the order be controlled? Would I need to use arrange() to order TERM2?

ggplot(data=working, aes(x=Year, color=TERM2, fill=TERM2)) +
  stat_bin(data = subset(working, TERM2=="A"), bins=80, aes(y=cumsum(..count..)),geom="area", position="stack", alpha=0.1) +
  stat_bin(data = subset(working, TERM2=="B"), bins=80, aes(y=cumsum(..count..)),geom="area", position="stack",alpha=0.1) +
  stat_bin(data = subset(working, TERM2=="Both"),bins=80, aes(y=cumsum(..count..)),geom="area", position="stack", alpha=0.1) + 
  ylab("Total Number") + xlim(1940,2020) + ggtitle("Cumulative number by measurement method")

What I am currently getting:

Cumulative Area Plot

Example of what I am trying to achieve:

The following chart was created in Excel using the same data which is exactly what I am looking to achieve in R.

Excel Example

My Data:

Example of how my data is currently structured:

 Year TERM2
 1944     A
 1959     B
 1966     A
 1968     B
 1968     A
 1970     A
 1971     B
 1971     B
 1971     A
 1971     A
 1971  Both
 1971  Both
 1971  Both
 1972     A
 1972  Both
 1972  Both
 1973     B
 1973     A
 1974     A
 1974     A

'data.frame':   803 obs. of  6 variables:
 $ Year          : int  1944 1959 1966 1968 1968 1970 1971 1971 1971 1971 ...
 $ TERM2         : Factor w/ 3 levels "B","A","Both": 2 1 2 1 2 2 1 1 2 2 ...

Changes based on user127649's suggestions

This is the plot after user127649's suggestions, which is close to what I would expect except I am looking for it to start at 0 and end at 803 (total number of publications).

ggplot(data=working, aes(x=Year, color=TERM2, fill=TERM2)) +
  stat_bin(bins=80, aes(y=cumsum(..count..)), geom="area", alpha=0.1) +
  ylab("Total Number") + xlim(1940,2020) + ggtitle("Cumulative number by measurement method")

after suggestions


Solution

  • I think there were two issues.

    1. When You use stat_bin() in three separate layers, each effectively has it’s own independent data set. This will give the correct count, but (and this is a guess really) I think being in three separate layers means you can’t stack them.

    2. If you use stat_bin() on all the layers I think stat = '..count..' performs cumsum() on the data as a whole.

    I don’t know whether this is the best approach or not, but I think it’s what you’re after.

    Data

    The data are grouped and cumsum() is used on each group separately.

    library(tidyverse)
    
    working <- working %>% 
         count(Year, TERM2) %>% 
         spread(TERM2, n, fill = 0) %>% 
         mutate_at(vars('A', 'B', 'Both'), cumsum) %>% 
         gather(TERM2, N, -Year, factor_key = T) #%>% 
         # mutate(TERM2 = ordered(TERM2, levels = rev(levels(TERM2))))
    

    Plot

    This code will produce the first plot below. If you prefer the look of the second plot, you can un-comment the last line of the data manipulation chunk.

    ggplot(working, aes(Year, N, fill = TERM2)) + 
         geom_area(position = 'stack') +
         ylab("Total Number")
    

    Result

    enter image description here