Search code examples
ggplot2gantt-chartgeom-bar

GGPlot combining/overlaying column and line (Gantt) charts


I would like to overlay rainfall data (column) over a Gantt chart that contains 'suggested sowing windows' and actual sowing dates. From the dataset, I can create both separately but not on one chart. Any pointers greatly appreciated.

enter image description here

## plot Gantt chart with suggested sowing dates and actual sowing dates
sowdate.df$Element <- factor(sowdate.df$Element,levels=c("SOWING DATE","Dart","Spitfire","Suntop","Beckom","Flanker","Lancer","Sunmax","Kittyhawk"))
ggplot(sowdate.df, aes(Date1, Element, Color=Category, group=Item)) +
  geom_line(size = 10) 

## plot rainfall
ggplot(sowdate.df, aes(Date1, rain)) + geom_col()


## combine Gantt and rainfall
ggplot(sowdate.df) + 
  geom_col(aes(Date1, rain), size = 1, color = "darkblue", fill = "white") +
  geom_line(aes(Date1, Element, Color=Category, group=Item), size = 1.5, color="red", group = 1)



      Item     Element    Category Start-End      Date1 rain
1     1      Beckom     Variety     Start 2018-05-07   NA
2     2        Dart     Variety     Start 2018-06-01   NA
3     3     Flanker     Variety     Start 2018-05-01   NA
4     4   Kittyhawk     Variety     Start 2018-04-01   NA
5     5      Lancer     Variety     Start 2018-05-01   NA
6     6 SOWING DATE Sowing date     Start 2018-06-06   NA
7     7 SOWING DATE Sowing date     Start 2018-06-26   NA
8     8 SOWING DATE Sowing date     Start 2018-07-03   NA
9     9 SOWING DATE Sowing date     Start 2018-07-12   NA
10   10    Spitfire     Variety     Start 2018-05-21   NA
11   11      Sunmax     Variety     Start 2018-04-15   NA
12   12      Suntop     Variety     Start 2018-05-07   NA
13    1      Beckom     Variety       End 2018-05-31   NA
14    2        Dart     Variety       End 2018-06-30   NA
15    3     Flanker     Variety       End 2018-05-21   NA
16    4   Kittyhawk     Variety       End 2018-05-07   NA
17    5      Lancer     Variety       End 2018-05-21   NA
18    6 SOWING DATE Sowing date       End 2018-06-07   NA
19    7 SOWING DATE Sowing date       End 2018-06-27   NA
20    8 SOWING DATE Sowing date       End 2018-07-04   NA
21    9 SOWING DATE Sowing date       End 2018-07-13   NA
22   10    Spitfire     Variety       End 2018-06-21   NA
23   11      Sunmax     Variety       End 2018-05-07   NA
24   12      Suntop     Variety       End 2018-06-07   NA
25   13        <NA>    Rainfall      <NA> 2018-04-14  3.0
26   14        <NA>    Rainfall      <NA> 2018-03-30  7.0
27   15        <NA>    Rainfall      <NA> 2018-06-10  3.5
28   16        <NA>    Rainfall      <NA> 2018-06-18  4.0
29   17        <NA>    Rainfall      <NA> 2018-06-28 13.5
30   18        <NA>    Rainfall      <NA> 2018-07-23  3.0
31   19        <NA>    Rainfall      <NA> 2018-08-05  6.0
32   20        <NA>    Rainfall      <NA> 2018-08-25 23.0
33   21        <NA>    Rainfall      <NA> 2018-09-10  5.0

Solution

  • As you can see on the image that you have posted - the plot you are shown just overlays two plots. Although this is also possible to do with ggplot2, I don't find this very elegant, and can be very tricky, because you need to find the exact positions of both plots so that it looks neat.

    Your workaround using geom_line with your factor levels as y values is interesting, but I am not sure if so desirable.

    In any case - this is probably the core of your problem. You are mixing different y measures - and they are of different classes. Factor levels for one plot, numeric / integer for the other. This is problematic. I would not try hard and force those into one y-axis, but I would rather create two plots and combine them with one of the plot combining packages such as patchwork. Like so

    I have renamed your columns, am using a package from GitHub user @alisdaire47 for reading your data and also change some columns in order to achieve the plot. Key is using the right classes: Dates as dates, numerics as numerics.

    First read your data:

    sowdate.df <- read.so::read_so('Item     Element    Category Start_End      Date1 rain
    1     1      Beckom     Variety     Start 2018-05-07   NA
    2     2        Dart     Variety     Start 2018-06-01   NA
    3     3     Flanker     Variety     Start 2018-05-01   NA
    4     4   Kittyhawk     Variety     Start 2018-04-01   NA
    5     5      Lancer     Variety     Start 2018-05-01   NA
    6     6 SOWING DATE Sowing date     Start 2018-06-06   NA
    7     7 SOWING DATE Sowing date     Start 2018-06-26   NA
    8     8 SOWING DATE Sowing date     Start 2018-07-03   NA
    9     9 SOWING DATE Sowing date     Start 2018-07-12   NA
    10   10    Spitfire     Variety     Start 2018-05-21   NA
    11   11      Sunmax     Variety     Start 2018-04-15   NA
    12   12      Suntop     Variety     Start 2018-05-07   NA
    13    1      Beckom     Variety       End 2018-05-31   NA
    14    2        Dart     Variety       End 2018-06-30   NA
    15    3     Flanker     Variety       End 2018-05-21   NA
    16    4   Kittyhawk     Variety       End 2018-05-07   NA
    17    5      Lancer     Variety       End 2018-05-21   NA
    18    6 SOWING DATE Sowing date       End 2018-06-07   NA
    19    7 SOWING DATE Sowing date       End 2018-06-27   NA
    20    8 SOWING DATE Sowing date       End 2018-07-04   NA
    21    9 SOWING DATE Sowing date       End 2018-07-13   NA
    22   10    Spitfire     Variety       End 2018-06-21   NA
    23   11      Sunmax     Variety       End 2018-05-07   NA
    24   12      Suntop     Variety       End 2018-06-07   NA
    25   13        <NA>    Rainfall      <NA> 2018-04-14  3.0
    26   14        <NA>    Rainfall      <NA> 2018-03-30  7.0
    27   15        <NA>    Rainfall      <NA> 2018-06-10  3.5
    28   16        <NA>    Rainfall      <NA> 2018-06-18  4.0
    29   17        <NA>    Rainfall      <NA> 2018-06-28 13.5
    30   18        <NA>    Rainfall      <NA> 2018-07-23  3.0
    31   19        <NA>    Rainfall      <NA> 2018-08-05  6.0
    32   20        <NA>    Rainfall      <NA> 2018-08-25 23.0
    33   21        <NA>    Rainfall      <NA> 2018-09-10  5.0')
    #> Warning: 8 parsing failures.
    #> row col  expected    actual         file
    #>   6  -- 6 columns 8 columns literal data
    #>   7  -- 6 columns 8 columns literal data
    #>   8  -- 6 columns 8 columns literal data
    #>   9  -- 6 columns 8 columns literal data
    #>  18  -- 6 columns 8 columns literal data
    #> ... ... ......... ......... ............
    #> See problems(...) for more details.
    

    now the plots

    library(tidyverse)
    library(patchwork)
    

    Prepare the data (the messiness is due to value scaling to your factor levels)

    sowdate <- sowdate.df %>% mutate(element_f = factor(Element,levels=c("SOWING DATE","Dart","Spitfire","Suntop","Beckom","Flanker","Lancer","Sunmax","Kittyhawk")),
                                     date = as.Date(Date1),
                                     rain = as.numeric(rain),
                                     rain_scaled = rain*max(length(levels(element_f))/max(rain, na.rm = TRUE)))
    #> Warning: NAs introduced by coercion
    

    Method 1 - combine plots using patchwork. I recommend this, in order not to mix different classes into one y.

    p1 <- ggplot(sowdate, aes(date, element_f, Color = Category, group = Item)) +
      geom_line(size = 10) +
      theme(axis.title.x = element_blank(),
            axis.text.x = element_blank(),
            axis.ticks.x = element_blank(),
            plot.margin = margin(b = 0))
    p2 <- ggplot(sowdate) +
      geom_col(aes(date, rain)) +
      theme(plot.margin = margin(t = 0))
    p1 + p2 + plot_layout(nrow = 2, )
    #> Warning: Removed 8 rows containing missing values (geom_path).
    #> Warning: Removed 24 rows containing missing values (position_stack).
    

    I removed the axis text and title and ticks from the first plot and lower and upper plot margins to bring them closer together

    Method 2 Combine different variable classes (I don't recommend that. This gets quite messy as you can see above and below). You'll need to scale your rain values to your factor levels, so that the columns overlap and don't get too long. Now this then requires a second y axis. For this you have to make your factor levels numeric, than create breaks and labels for the left y-axis and then re-transform the rain values to their real values, and hope that the breaks kind of works. I don't think a second y-axis really helps to read the graph.

    
    max_rain <- max(sowdate$rain,na.rm = TRUE)
    breaks_ax <- 1:length(levels(sowdate$element_f)) - sum(is.na(levels(sowdate$element_f)))
    labels_ax <- as.character(levels(sowdate$element_f)[which(!is.na(levels(sowdate$element_f)))])
    
    ggplot(sowdate, aes(date, as.numeric(element_f), Color = Category, group=Item)) +
      geom_line(size = 10) +
      geom_col(aes(date, rain_scaled)) +
      scale_y_continuous(breaks = breaks_ax, labels = labels_ax, 
                         sec.axis = sec_axis(~ .*max_rain/ max(length(levels(sowdate$element_f))))) +
      labs(y = 'Element')
    #> Warning: Removed 24 rows containing missing values (position_stack).
    #> Warning: Removed 17 rows containing missing values (geom_path).
    

    Created on 2020-01-22 by the reprex package (v0.3.0)