I am looking to convert the following R data frame into one that is indexed by seconds and have no idea how to do it. Maybe dcast but then in confused on how to expand out the word that's being spoken.
startTime endTime word
1 1.900s 2.300s hey
2 2.300s 2.800s I'm
3 2.800s 3s John
4 3s 3.400s right
5 3.400s 3.500s now
6 3.500s 3.800s I
7 3.800s 4.300s help
Time word
1.900s hey
2.000s hey
2.100s hey
2.200s hey
2.300s I'm
2.400s I'm
2.500s I'm
2.600s I'm
2.700s I'm
2.800s John
2.900s John
3.000s right
3.100s right
3.200s right
3.300s right
One solution can be achieved using tidyr::expand
.
EDITED: Based on feedback from OP, as his data got duplicate startTime
library(tidyverse)
step = 0.1
df %>% group_by(rnum = row_number()) %>%
expand(Time = seq(startTime, max(startTime, (endTime-step)), by=step), word = word) %>%
arrange(Time) %>%
ungroup() %>%
select(-rnum)
# # A tibble: 24 x 2
# # Groups: word [7]
# Time word
# <dbl> <chr>
# 1 1.90 hey
# 2 2.00 hey
# 3 2.10 hey
# 4 2.20 hey
# 5 2.30 I'm
# 6 2.40 I'm
# 7 2.50 I'm
# 8 2.60 I'm
# 9 2.70 I'm
# 10 2.80 John
# ... with 14 more rows
Data
df <- read.table(text =
"startTime endTime word
1.900 2.300 hey
2.300 2.800 I'm
2.800 3 John
3 3.400 right
3.400 3.500 now
3.500 3.800 I
3.800 4.300 help",
header = TRUE, stringsAsFactors = FALSE)