I have a data frame of time series data for multiple individuals. The data consists of surface intervals and dive intervals of animals over time for each individual. For every surface interval I would like to use ggplot to plot the duration of the surface interval against the duration of the previous dive where available. If there were two surface intervals in a row, I'd like to ignore them, and just plot surfacings that have a dive directly before them. I'd like to do this per Individual ID. I have supplied some example data below:
I would prefer to use the dplyr package group_by() function for individuals, but not sure how to select each dive and pair it with the following (subsequent) surfacing.
df <- data.frame(ID=c("A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","B"),
What=c("Dive", "Surface", "Dive","Surface","Dive", "Surface", "Surface", "Dive", "Surface", "Dive", "Surface", "Dive", "Dive", "Surface", "Dive", "Surface", "Dive", "Surface"),
Start=c("2010-05-09 17:29:45", "2010-05-09 17:56:24", "2010-05-09 18:22:15", "2010-05-09 18:52:38", "2010-05-09 18:59:02", "2010-05-09 19:24:37","2010-05-09 19:30:00", "2010-05-09 19:30:57", "2010-05-09 19:48:00","2010-05-03 18:49:35", "2010-05-03 18:58:00", "2010-05-03 19:27:51","2010-05-03 19:35:42", "2010-05-03 20:15:41", "2010-05-03 20:24:13","2010-05-03 20:53:32", "2010-05-03 21:01:31", "2010-05-03 21:40:26"),
End=c("2010-05-09 17:56:24", "2010-05-09 18:22:15", "2010-05-09 18:52:38","2010-05-09 18:59:02", "2010-05-09 19:24:37", "2010-05-09 19:29:28","2010-05-09 19:30:57", "2010-05-09 19:48:00", "2010-05-09 19:49:02", "2010-05-03 18:58:06", "2010-05-03 19:27:51", "2010-05-03 19:35:42", "2010-05-03 20:15:41", "2010-05-03 20:24:13", "2010-05-03 20:53:32", "2010-05-03 21:01:31", "2010-05-03 21:40:26", "2010-05-03 21:48:44"),
Duration = c(26.65, 25.85, 30.38, 6.40, 25.58, 4.85, 0.95, 17.05, 1.03, 8.52, 29.85, 7.85, 39.98, 8.53, 29.32, 7.98, 38.92, 8.30))
df$Start<-as.POSIXct(df$Start, format = "%Y-%m-%d %H:%M:%S")
df$End<-as.POSIXct(df$End, format = "%Y-%m-%d %H:%M:%S")
I would like to make a ggplot with the x axis as surface duration, and the y axis of the previous dive duration. If there are two dives in a row, ignore the first one and plot the second one against the next surfacing; same goes for multiple surfacings; I'd just like to pick the surfacings that have a dive right before them.
Any assistance would be much appreciated!
I'm not 100% sure of what you're trying to do, but if I understand correctly... we can do some manipulation to get an eight-row data frame with the four dive-surface pairs for each of the two individuals:
df2 <-
df %>%
group_by(ID) %>%
filter(What != lead(What) | is.na(lead(What))) %>%
select(ID, What, Duration) %>%
mutate(dive_number = ceiling(row_number() / 2)) %>%
ungroup() %>%
spread(What, Duration)
# A tibble: 8 x 4
ID dive_number Dive Surface
<fct> <dbl> <dbl> <dbl>
1 A 1 26.6 25.8
2 A 2 30.4 6.4
3 A 3 25.6 0.95
4 A 4 17.0 1.03
5 B 1 8.52 29.8
6 B 2 40.0 8.53
7 B 3 29.3 7.98
8 B 4 38.9 8.3
Then you can plot the results:
df2 %>%
ggplot(aes(x = Surface, y = Dive, color = ID)) +
geom_point()