Search code examples
rggplot2geom-area

Why is ggplot geom_area empty when attempting to plot a stacked area graph based off aggregated data?


I am attempting to create a stacked area graph to indicate the proportion of tweets per month/year for a assigned Topic. My dataframe has three columns; tweet_time, Topic, count. A head() of which is pasted below. I have looked at similar questions such as those below but their respective solutions are not providing a fix in this case. Why is my stacked area graph in ggplot2 empty R ggplot2 geom_area() not working

My dataframe is as follows:

 tweet_time Topic count
   <chr>      <chr> <dbl>
 1 01-2012    2         3
 2 01-2012    3         4
 3 01-2012    4         4
 4 01-2012    5         2
 5 01-2013    1        15
 6 01-2013    2        57
 7 01-2013    3        65
 8 01-2013    4        66
 9 01-2013    5        54
10 01-2014    1         3
11 01-2014    2         7
12 01-2014    3        10
13 01-2014    4         5
14 01-2014    5         2
15 01-2015    1         3
16 01-2015    2         6
17 01-2015    3         6
18 01-2015    4         5
19 01-2015    5         8
20 01-2016    1         7

And the code I am using for the plot is currently:

ggplot(test, aes(x = tweet_time,y = count, fill = Topic))+
 geom_area(aes(fill= Topic, position='stack'))

I am wondering if the issue could have something to do with the tweet_time column not being sorted by month (ie. 02/2012 is not immediately after 01/2012) and the format not being a date? However, when trying to mutate as.date it does not recognise the format.

Any help would be great.


Solution

  • I think there are three issues here that might be causing your problem or leading to one down the line:

    1. date not in date format

    I add mutate(tweet_time = lubridate::dmy(paste(1, tweet_time))) %>% to convert to a date, which will work more automatically with ggplot2

    1. missing combinations

    area plots can show up incorrectly when zeroes are excluded from the series, since it's ambiguous to ggplot whether to join the data points that exist (what it does) vs. assuming a missing point represents a zero (usually what we want). You can add tidyr::complete(tweet_time, Topic, fill = list(count = 0)) %>% to add those.

    1. fill as integer

    For area plots, ggplot may throw the Error: Aesthetics can not vary with a ribbon if the fill is an integer, instead of a character or factor. I'm not totally sure why that happens and whether there's a justification for working that way, but the easiest fix is to make it the fill a character or factor.

    The code below works for me:

    library(tidyverse)
    data.frame(
      stringsAsFactors = FALSE,
            tweet_time = c("01-2012","01-2012","01-2012",
                           "01-2012","01-2013","01-2013","01-2013","01-2013",
                           "01-2013","01-2014","01-2014","01-2014","01-2014",
                           "01-2014","01-2015","01-2015","01-2015","01-2015",
                           "01-2015","01-2016"),
                 Topic = c(2L,3L,4L,5L,1L,2L,3L,4L,
                           5L,1L,2L,3L,4L,5L,1L,2L,3L,4L,5L,1L),
                 count = c(3L,4L,4L,2L,15L,57L,65L,
                           66L,54L,3L,7L,10L,5L,2L,3L,6L,6L,5L,8L,7L)
    ) %>%
      tidyr::complete(tweet_time, Topic, fill = list(count = 0)) %>%
      mutate(tweet_time = lubridate::my(tweet_time))) %>%
      ggplot(aes(tweet_time, count, fill = as.character(Topic))) +
      geom_area(position = 'stack')
    

    enter image description here