Search code examples
rggplot2scalegganimate

R - How to plot ggplot2 with two y axes on different scales *with time variables


I can make a ggplot2 plot with the same x-axis (say, year), but different y-axes (on very different scales. Is it possible to use gganimate to animate two lines, each corresponding to its own y-axis? I have been able to create two lines using the same y-axis, but can't figure out how to use two axes.

I think the issue in my particular case might be relating to the fact that my y-axis variables are in POSIX format.

Say I create the a dataset as follows:

library(ggplot2)
library(gganimate)
library(htmltab)
library(lubridate)

#marathon
data0 <- htmltab("https://en.wikipedia.org/wiki/Marathon_world_record_progression",1)
data <- data0[,c(1,4)]
#remove ones that are ARRS only
data <- data[-c(9,12,13,22,27,33,34,35,36,51),]
#data <- data %>% mutate(time = Time %>% hms())
data$time2 <- as.POSIXct(data$Time, format = "%H:%M:%S")
data$date <- mdy(data$Date)
data$race <- "Marathon"

#mile
mile0 <- htmltab("https://en.wikipedia.org/wiki/Mile_run_world_record_progression",4)
mile <- mile0[,c(1,4)]
#mile <- mile0 %>% mutate(time = Time %>% ms())
mile$time2 <-  as.POSIXct(mile$Time, format = "%M:%S")
mile$date <- dmy(mile$Date)
mile$race <- "Mile"

marathon <- data[,c(3,4)]
names(marathon)[1]<-"marathon"

mile2 <- mile[,c(3,4)]
names(mile2)[1]<-"mile"
a <- merge(marathon, mile2, by="date", all=TRUE)

I can then get a gganimate animation to work as follows:

ggplot(a) +
    geom_point(aes(x=date, y=marathon, group=date, color="blue")) +
    geom_point(aes(x=date, y=mile, group=date, color="red")) +
    scale_y_continuous(sec.axis = sec_axis(~./152, name = "CDF"), breaks=seq(0,150,25))
    transition_reveal(date)

The problem is that the two are on very different scales (one is about 2-3 hours, while the other is about 2.5-3.5 minutes). How can I get them on the same scale? If they were in a normal format, I might be able to do something like the following:

ggplot(a) +
    geom_point(aes(x=date, y=marathon, group=date, color="blue")) +
    geom_point(aes(x=date, y=mile*65, group=date, color="red")) +
    scale_y_continuous(sec.axis = sec_axis(~./65, name = "Mile"), breaks=seq(0,150,25)) +
    transition_reveal(date)

However, I get an error due to the POSIX format the y-variables are in. What should I do? (Ideally, I would like to get them on scales so that the vertical range of each variable basically fills the vertical distance.)

For reference, here is the result from the plot that I want to fix:

enter image description here

I fear that this may not be possible. See https://ggplot2.tidyverse.org/reference/sec_axis.html:

"As of v3.1, date and datetime scales have limited secondary axis capabilities. Unlike other continuous scales, secondary axis transformations for date and datetime scales must respect their primary POSIX data structure. This means they may only be transformed via addition or subtraction, e.g. ~ . + hms::hms(days = 8), or ~ . - 8*60*60. Nonlinear transformations will return an error. To produce a time-since-event secondary axis in this context, users may consider adapting secondary axis labels."


Solution

  • One approach would be to convert the time to decimal hours (or minutes, etc.) and adjust the scale labels:

    library(dplyr);  library(lubridate)
    a %>%
      # tidyr::gather(type, time, -date) %>% 
      tidyr::pivot_longer(-date, "type", "time") %>%   # Preferred syntax since tidyr 1.0.0
      mutate(time_dec = hour(value) + minute(value)/60 + second(value)/3600,
             time_scaled = time_dec * if_else(type == "mile", 30, 1)) %>% 
      ggplot() +
      geom_point(aes(x=date, y=time_scaled, group=value, color = type)) +
      scale_y_continuous(breaks = 0:3, 
                         labels = c("0", "1:00", "2:00", "3:00"),
                         name = "Marathon",
                         sec.axis = sec_axis(~./30, 
                                             name = "Mile", 
                                             breaks = (1/60)*0:100,
                                             labels = 0:100)) +
      expand_limits(y = c(1.5,3)) +
      transition_reveal(date)
    

    enter image description here