I'm working with transcripts of speech:
Utterance Starttime_ms Endtime_ms
<chr> <dbl> <dbl>
1 on this 210 780
2 okay 3403 3728
3 cool thanks everyone um 4221 5880
4 so yes in terms of our projects 5910 11960
5 let's have a look so the 11980 13740
6 LGBTQ plus 13813 16110
and would like to insert after each Utterance
a new row indicating the time gap vis-à-vis the prior Utterance
. The desired output would look somewhat like this:
Utterance Starttime_ms Endtime_ms
<chr> <dbl> <dbl>
1 on this 210 780
NA 780 3403
2 okay 3403 3728
NA 3728 4221
3 cool thanks everyone um 4221 5880
NA 5880 5910
4 so yes in terms of our projects 5910 11960
NA 11960 11980
5 let's have a look so the 11980 13740
NA 13740 13813
6 LGBTQ plus 13813 16110
I know how to do it in data.table
:
library(data.table)
unq <- c(0, sort(unique(setDT(df)[, c(Starttime_ms, Endtime_ms)])))
df <- df[.(unq[-length(unq)], unq[-1]), on=c("Starttime_ms", "Endtime_ms")]
But am looking for a dplyr
solution.
Data:
df <- structure(list(Utterance = c("on this", "okay", "cool thanks everyone um",
"so yes in terms of our projects",
"let's have a look so the", "LGBTQ plus"), Starttime_ms = c(210,
3403, 4221, 5910, 11980, 13813), Endtime_ms = c(780, 3728, 5880,
11960, 13740, 16110)), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
You can try the code below
df %>%
pivot_longer(-Utterance, values_to = "Starttime_ms") %>%
mutate(Endtime_ms = lead(Starttime_ms)) %>%
drop_na() %>%
select(-name) %>%
mutate(Utterance = replace(Utterance, !row_number() %% 2, NA_character_))
which gives
# A tibble: 11 × 3
Utterance Starttime_ms Endtime_ms
<chr> <dbl> <dbl>
1 on this 210 780
2 NA 780 3403
3 okay 3403 3728
4 NA 3728 4221
5 cool thanks everyone um 4221 5880
6 NA 5880 5910
7 so yes in terms of our projects 5910 11960
8 NA 11960 11980
9 let's have a look so the 11980 13740
10 NA 13740 13813
11 LGBTQ plus 13813 16110