Search code examples
rggplot2dplyrtime-seriessequential

R - Plot pairs of data from pairs of sequential rows where available in a data frame in r


I have a data frame of time series data for multiple individuals. The data consists of surface intervals and dive intervals of animals over time for each individual. For every surface interval I would like to use ggplot to plot the duration of the surface interval against the duration of the previous dive where available. If there were two surface intervals in a row, I'd like to ignore them, and just plot surfacings that have a dive directly before them. I'd like to do this per Individual ID. I have supplied some example data below:

I would prefer to use the dplyr package group_by() function for individuals, but not sure how to select each dive and pair it with the following (subsequent) surfacing.

df <- data.frame(ID=c("A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","B"), 
What=c("Dive", "Surface", "Dive","Surface","Dive", "Surface", "Surface", "Dive", "Surface", "Dive", "Surface", "Dive", "Dive", "Surface", "Dive", "Surface", "Dive", "Surface"), 
Start=c("2010-05-09 17:29:45", "2010-05-09 17:56:24", "2010-05-09 18:22:15", "2010-05-09 18:52:38", "2010-05-09 18:59:02", "2010-05-09 19:24:37","2010-05-09 19:30:00", "2010-05-09 19:30:57", "2010-05-09 19:48:00","2010-05-03 18:49:35", "2010-05-03 18:58:00", "2010-05-03 19:27:51","2010-05-03 19:35:42", "2010-05-03 20:15:41", "2010-05-03 20:24:13","2010-05-03 20:53:32", "2010-05-03 21:01:31", "2010-05-03 21:40:26"), 
End=c("2010-05-09 17:56:24", "2010-05-09 18:22:15", "2010-05-09 18:52:38","2010-05-09 18:59:02", "2010-05-09 19:24:37", "2010-05-09 19:29:28","2010-05-09 19:30:57", "2010-05-09 19:48:00", "2010-05-09 19:49:02", "2010-05-03 18:58:06", "2010-05-03 19:27:51", "2010-05-03 19:35:42", "2010-05-03 20:15:41", "2010-05-03 20:24:13", "2010-05-03 20:53:32", "2010-05-03 21:01:31", "2010-05-03 21:40:26", "2010-05-03 21:48:44"), 
Duration = c(26.65, 25.85, 30.38,  6.40, 25.58,  4.85,  0.95, 17.05, 1.03,  8.52, 29.85,  7.85, 39.98,  8.53, 29.32,  7.98, 38.92,  8.30))

df$Start<-as.POSIXct(df$Start, format = "%Y-%m-%d %H:%M:%S")
df$End<-as.POSIXct(df$End, format = "%Y-%m-%d %H:%M:%S")

I would like to make a ggplot with the x axis as surface duration, and the y axis of the previous dive duration. If there are two dives in a row, ignore the first one and plot the second one against the next surfacing; same goes for multiple surfacings; I'd just like to pick the surfacings that have a dive right before them.

Any assistance would be much appreciated!


Solution

  • I'm not 100% sure of what you're trying to do, but if I understand correctly... we can do some manipulation to get an eight-row data frame with the four dive-surface pairs for each of the two individuals:

    df2 <- 
      df %>% 
      group_by(ID) %>% 
      filter(What != lead(What) | is.na(lead(What))) %>% 
      select(ID, What, Duration) %>% 
      mutate(dive_number = ceiling(row_number() / 2)) %>% 
      ungroup() %>% 
      spread(What, Duration)
    
    # A tibble: 8 x 4
      ID    dive_number  Dive Surface
      <fct>       <dbl> <dbl>   <dbl>
    1 A               1 26.6    25.8 
    2 A               2 30.4     6.4 
    3 A               3 25.6     0.95
    4 A               4 17.0     1.03
    5 B               1  8.52   29.8 
    6 B               2 40.0     8.53
    7 B               3 29.3     7.98
    8 B               4 38.9     8.3 
    

    Then you can plot the results:

    df2 %>% 
      ggplot(aes(x = Surface, y = Dive, color = ID)) +
      geom_point()