I would like to overlay rainfall data (column) over a Gantt chart that contains 'suggested sowing windows' and actual sowing dates. From the dataset, I can create both separately but not on one chart. Any pointers greatly appreciated.
## plot Gantt chart with suggested sowing dates and actual sowing dates
sowdate.df$Element <- factor(sowdate.df$Element,levels=c("SOWING DATE","Dart","Spitfire","Suntop","Beckom","Flanker","Lancer","Sunmax","Kittyhawk"))
ggplot(sowdate.df, aes(Date1, Element, Color=Category, group=Item)) +
geom_line(size = 10)
## plot rainfall
ggplot(sowdate.df, aes(Date1, rain)) + geom_col()
## combine Gantt and rainfall
ggplot(sowdate.df) +
geom_col(aes(Date1, rain), size = 1, color = "darkblue", fill = "white") +
geom_line(aes(Date1, Element, Color=Category, group=Item), size = 1.5, color="red", group = 1)
Item Element Category Start-End Date1 rain
1 1 Beckom Variety Start 2018-05-07 NA
2 2 Dart Variety Start 2018-06-01 NA
3 3 Flanker Variety Start 2018-05-01 NA
4 4 Kittyhawk Variety Start 2018-04-01 NA
5 5 Lancer Variety Start 2018-05-01 NA
6 6 SOWING DATE Sowing date Start 2018-06-06 NA
7 7 SOWING DATE Sowing date Start 2018-06-26 NA
8 8 SOWING DATE Sowing date Start 2018-07-03 NA
9 9 SOWING DATE Sowing date Start 2018-07-12 NA
10 10 Spitfire Variety Start 2018-05-21 NA
11 11 Sunmax Variety Start 2018-04-15 NA
12 12 Suntop Variety Start 2018-05-07 NA
13 1 Beckom Variety End 2018-05-31 NA
14 2 Dart Variety End 2018-06-30 NA
15 3 Flanker Variety End 2018-05-21 NA
16 4 Kittyhawk Variety End 2018-05-07 NA
17 5 Lancer Variety End 2018-05-21 NA
18 6 SOWING DATE Sowing date End 2018-06-07 NA
19 7 SOWING DATE Sowing date End 2018-06-27 NA
20 8 SOWING DATE Sowing date End 2018-07-04 NA
21 9 SOWING DATE Sowing date End 2018-07-13 NA
22 10 Spitfire Variety End 2018-06-21 NA
23 11 Sunmax Variety End 2018-05-07 NA
24 12 Suntop Variety End 2018-06-07 NA
25 13 <NA> Rainfall <NA> 2018-04-14 3.0
26 14 <NA> Rainfall <NA> 2018-03-30 7.0
27 15 <NA> Rainfall <NA> 2018-06-10 3.5
28 16 <NA> Rainfall <NA> 2018-06-18 4.0
29 17 <NA> Rainfall <NA> 2018-06-28 13.5
30 18 <NA> Rainfall <NA> 2018-07-23 3.0
31 19 <NA> Rainfall <NA> 2018-08-05 6.0
32 20 <NA> Rainfall <NA> 2018-08-25 23.0
33 21 <NA> Rainfall <NA> 2018-09-10 5.0
As you can see on the image that you have posted - the plot you are shown just overlays two plots. Although this is also possible to do with ggplot2, I don't find this very elegant, and can be very tricky, because you need to find the exact positions of both plots so that it looks neat.
Your workaround using geom_line
with your factor levels as y values is interesting, but I am not sure if so desirable.
In any case - this is probably the core of your problem. You are mixing different y measures - and they are of different classes. Factor levels for one plot, numeric / integer for the other. This is problematic. I would not try hard and force those into one y-axis, but I would rather create two plots and combine them with one of the plot combining packages such as patchwork
. Like so
I have renamed your columns, am using a package from GitHub user @alisdaire47 for reading your data and also change some columns in order to achieve the plot. Key is using the right classes: Dates as dates, numerics as numerics.
First read your data:
sowdate.df <- read.so::read_so('Item Element Category Start_End Date1 rain
1 1 Beckom Variety Start 2018-05-07 NA
2 2 Dart Variety Start 2018-06-01 NA
3 3 Flanker Variety Start 2018-05-01 NA
4 4 Kittyhawk Variety Start 2018-04-01 NA
5 5 Lancer Variety Start 2018-05-01 NA
6 6 SOWING DATE Sowing date Start 2018-06-06 NA
7 7 SOWING DATE Sowing date Start 2018-06-26 NA
8 8 SOWING DATE Sowing date Start 2018-07-03 NA
9 9 SOWING DATE Sowing date Start 2018-07-12 NA
10 10 Spitfire Variety Start 2018-05-21 NA
11 11 Sunmax Variety Start 2018-04-15 NA
12 12 Suntop Variety Start 2018-05-07 NA
13 1 Beckom Variety End 2018-05-31 NA
14 2 Dart Variety End 2018-06-30 NA
15 3 Flanker Variety End 2018-05-21 NA
16 4 Kittyhawk Variety End 2018-05-07 NA
17 5 Lancer Variety End 2018-05-21 NA
18 6 SOWING DATE Sowing date End 2018-06-07 NA
19 7 SOWING DATE Sowing date End 2018-06-27 NA
20 8 SOWING DATE Sowing date End 2018-07-04 NA
21 9 SOWING DATE Sowing date End 2018-07-13 NA
22 10 Spitfire Variety End 2018-06-21 NA
23 11 Sunmax Variety End 2018-05-07 NA
24 12 Suntop Variety End 2018-06-07 NA
25 13 <NA> Rainfall <NA> 2018-04-14 3.0
26 14 <NA> Rainfall <NA> 2018-03-30 7.0
27 15 <NA> Rainfall <NA> 2018-06-10 3.5
28 16 <NA> Rainfall <NA> 2018-06-18 4.0
29 17 <NA> Rainfall <NA> 2018-06-28 13.5
30 18 <NA> Rainfall <NA> 2018-07-23 3.0
31 19 <NA> Rainfall <NA> 2018-08-05 6.0
32 20 <NA> Rainfall <NA> 2018-08-25 23.0
33 21 <NA> Rainfall <NA> 2018-09-10 5.0')
#> Warning: 8 parsing failures.
#> row col expected actual file
#> 6 -- 6 columns 8 columns literal data
#> 7 -- 6 columns 8 columns literal data
#> 8 -- 6 columns 8 columns literal data
#> 9 -- 6 columns 8 columns literal data
#> 18 -- 6 columns 8 columns literal data
#> ... ... ......... ......... ............
#> See problems(...) for more details.
now the plots
library(tidyverse)
library(patchwork)
Prepare the data (the messiness is due to value scaling to your factor levels)
sowdate <- sowdate.df %>% mutate(element_f = factor(Element,levels=c("SOWING DATE","Dart","Spitfire","Suntop","Beckom","Flanker","Lancer","Sunmax","Kittyhawk")),
date = as.Date(Date1),
rain = as.numeric(rain),
rain_scaled = rain*max(length(levels(element_f))/max(rain, na.rm = TRUE)))
#> Warning: NAs introduced by coercion
Method 1 - combine plots using patchwork. I recommend this, in order not to mix different classes into one y.
p1 <- ggplot(sowdate, aes(date, element_f, Color = Category, group = Item)) +
geom_line(size = 10) +
theme(axis.title.x = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
plot.margin = margin(b = 0))
p2 <- ggplot(sowdate) +
geom_col(aes(date, rain)) +
theme(plot.margin = margin(t = 0))
p1 + p2 + plot_layout(nrow = 2, )
#> Warning: Removed 8 rows containing missing values (geom_path).
#> Warning: Removed 24 rows containing missing values (position_stack).
I removed the axis text and title and ticks from the first plot and lower and upper plot margins to bring them closer together
Method 2 Combine different variable classes (I don't recommend that. This gets quite messy as you can see above and below). You'll need to scale your rain values to your factor levels, so that the columns overlap and don't get too long. Now this then requires a second y axis. For this you have to make your factor levels numeric, than create breaks and labels for the left y-axis and then re-transform the rain values to their real values, and hope that the breaks kind of works. I don't think a second y-axis really helps to read the graph.
max_rain <- max(sowdate$rain,na.rm = TRUE)
breaks_ax <- 1:length(levels(sowdate$element_f)) - sum(is.na(levels(sowdate$element_f)))
labels_ax <- as.character(levels(sowdate$element_f)[which(!is.na(levels(sowdate$element_f)))])
ggplot(sowdate, aes(date, as.numeric(element_f), Color = Category, group=Item)) +
geom_line(size = 10) +
geom_col(aes(date, rain_scaled)) +
scale_y_continuous(breaks = breaks_ax, labels = labels_ax,
sec.axis = sec_axis(~ .*max_rain/ max(length(levels(sowdate$element_f))))) +
labs(y = 'Element')
#> Warning: Removed 24 rows containing missing values (position_stack).
#> Warning: Removed 17 rows containing missing values (geom_path).
Created on 2020-01-22 by the reprex package (v0.3.0)