I need to join two datasets together:
>allsecsHR
timestamp HRbpm
2023-03-22 09:04:53 101
2023-03-22 09:04:54 124
2023-03-22 09:04:55 103
2023-03-22 09:04:56 111
2023-03-22 09:04:57 112
2023-03-22 09:04:58 143
2023-03-22 09:04:59 109
2023-03-22 09:05:00 129
2023-03-22 09:05:01 122
2023-03-22 09:05:02 125
2023-03-22 09:05:03 110
>bhr
Behaviour BhrTimeStart BhrTimeEnd
Forage 2023-03-22 09:04:53 2023-03-22 09:04:58
Vigilance 2023-03-22 09:04:58 2023-03-22 09:05:03
Forage 2023-03-22 09:05:03 2023-03-22 09:05:10
At the end I would like a dataset that has each row representing one second, with the behaviour that was being performed per second
I have tried to do this in mySQL and in R using sqldf and powerjoin but it keeps losing connection to the sql server after running for 2 hours or in R it does not complete and crashes or returns Error: vector memory exhausted (limit reached?)
. I would really appreciate some help to find an efficient way to do this!
I think a data.table solution might be the fastest but I am not sure how to do it, a tidy solution would also be great!
So far I have tried:
mySQL/sqldf
library(sqldf)
sqldf("select * from allsecsHR
left join bhr
on allsecsHR.timestamp between bhr.BhrTimeStart and bhr.BhrTimeEnd")
powerjoin
library(powerjoin)
power_left_join(
allsecsHR, bhr,
by = ~.x$timestamp > .y$BhrTimeStart &
(.x$timestamp < .y$BhrTimeEnd | is.na(.y$BhrTimeEnd)),
keep = "left")
The output table I would like is this:
timestamp HRbpm Bhr BhrTimeStart BhrTimeEnd
2023-03-22 09:04:53 101 Forage 2023-03-22 09:04:53 2023-03-22 09:04:58
2023-03-22 09:04:54 124 Forage 2023-03-22 09:04:53 2023-03-22 09:04:58
2023-03-22 09:04:55 103 Forage 2023-03-22 09:04:53 2023-03-22 09:04:58
2023-03-22 09:04:56 111 Forage 2023-03-22 09:04:53 2023-03-22 09:04:58
2023-03-22 09:04:57 112 Forage 2023-03-22 09:04:53 2023-03-22 09:04:58
2023-03-22 09:04:58 143 Vigilance 2023-03-22 09:04:58 2023-03-22 09:05:03
2023-03-22 09:04:59 109 Vigilance 2023-03-22 09:04:58 2023-03-22 09:05:03
2023-03-22 09:05:00 129 Vigilance 2023-03-22 09:04:58 2023-03-22 09:05:03
2023-03-22 09:05:01 122 Vigilance 2023-03-22 09:04:58 2023-03-22 09:05:03
2023-03-22 09:05:02 125 Vigilance 2023-03-22 09:04:58 2023-03-22 09:05:03
2023-03-22 09:05:03 110 Forage 2023-03-22 09:05:03 2023-03-22 09:05:10
If all timestamps in allsecsHR
have a corresponding interval in bhr
:
library(data.table)
setDT(bhr)[setDT(allsecsHR)[, ts := timestamp], on = .(BhrTimeStart == ts), roll = TRUE]
#> Bhr BhrTimeStart BhrTimeEnd timestamp HRbpm
#> 1: Forage 2023-03-22 09:04:53 2023-03-22 09:04:58 2023-03-22 09:04:53 101
#> 2: Forage 2023-03-22 09:04:54 2023-03-22 09:04:58 2023-03-22 09:04:54 124
#> 3: Forage 2023-03-22 09:04:55 2023-03-22 09:04:58 2023-03-22 09:04:55 103
#> 4: Forage 2023-03-22 09:04:56 2023-03-22 09:04:58 2023-03-22 09:04:56 111
#> 5: Forage 2023-03-22 09:04:57 2023-03-22 09:04:58 2023-03-22 09:04:57 112
#> 6: Vigilance 2023-03-22 09:04:58 2023-03-22 09:05:03 2023-03-22 09:04:58 143
#> 7: Vigilance 2023-03-22 09:04:59 2023-03-22 09:05:03 2023-03-22 09:04:59 109
#> 8: Vigilance 2023-03-22 09:05:00 2023-03-22 09:05:03 2023-03-22 09:05:00 129
#> 9: Vigilance 2023-03-22 09:05:01 2023-03-22 09:05:03 2023-03-22 09:05:01 122
#> 10: Vigilance 2023-03-22 09:05:02 2023-03-22 09:05:03 2023-03-22 09:05:02 125
#> 11: Forage 2023-03-22 09:05:03 2023-03-22 09:05:10 2023-03-22 09:05:03 110
If not all timestamps in allsecsHR
have a corresponding interval in bhr
, a non-equi join will work:
setDT(bhr)[setDT(allsecsHR)[, ts := timestamp], on = .(BhrTimeStart <= ts, BhrTimeEnd > ts)]
Data:
allsecsHR <- structure(list(timestamp = structure(c(1679490293, 1679490294,
1679490295, 1679490296, 1679490297, 1679490298, 1679490299, 1679490300,
1679490301, 1679490302, 1679490303), class = c("POSIXct", "POSIXt"
), tzone = ""), HRbpm = c(101L, 124L, 103L, 111L, 112L, 143L,
109L, 129L, 122L, 125L, 110L)), row.names = c(NA, -11L), class = c("data.table", "data.frame"))
bhr <- structure(list(Bhr = c("Forage", "Vigilance", "Forage"), BhrTimeStart = structure(c(1679490293,
1679490298, 1679490303), class = c("POSIXct", "POSIXt"), tzone = ""),
BhrTimeEnd = structure(c(1679490298, 1679490303, 1679490310
), class = c("POSIXct", "POSIXt"), tzone = "")), row.names = c(NA,
-3L), class = c("data.table", "data.frame"))