I am dealing with the following problem: I have 2 data frames, a and b:
#dataframe a
ID1 <- c("1", "1", "1", "1", "2", "2", "2")
time<- c("2022-04-12 08:52", "2022-04-12 15:34", "2022-04-12 16:45", "2022-04-12 22:23", "2022-04-12 02:15", "2022-04-12 05:24", "2022-04-12 14:55")
a <- data.frame(ID1, time)
a$time <- as.POSIXct(a$time, origin="1970-01-01")
a
ID1 time
1 2022-04-12 08:52:00
1 2022-04-12 15:34:00
1 2022-04-12 16:45:00
1 2022-04-12 22:23:00
2 2022-04-12 02:15:00
2 2022-04-12 05:24:00
2 2022-04-12 14:55:00
#dataframe b
ID2 <- c("1", "1", "1", "1", "2", "2", "2", "2")
start <- c("2022-04-12 00:00", "2022-04-12 08:00", "2022-04-12 17:00", "2022-04-12 18:00", "2022-04-12 00:00", "2022-04-12 15:00", "2022-04-12 16:00", "2022-04-12 19:00")
end <- c("2022-04-12 08:00", "2022-04-12 17:00", "2022-04-12 18:00", "2022-04-12 00:00", "2022-04-12 15:00 ", "2022-04-12 16:00", "2022-04-12 19:00", "2022-04-12 00:00")
activity <- c("At home", "Work", "Travel", "Home", "Home", "Travel", "Work", "Home")
b <- data.frame(ID2, start, end, activity)
b$start <- as.POSIXct(b$start, origin="1970-01-01")
b$end <- as.POSIXct(b$end, origin="1970-01-01")
b
ID2 start end activity
1 2022-04-12 00:00:00 2022-04-12 08:00:00 At home
1 2022-04-12 08:00:00 2022-04-12 17:00:00 Work
1 2022-04-12 17:00:00 2022-04-12 18:00:00 Travel
1 2022-04-12 18:00:00 2022-04-12 00:00:00 Home
2 2022-04-12 00:00:00 2022-04-12 15:00:00 Home
2 2022-04-12 15:00:00 2022-04-12 16:00:00 Travel
2 2022-04-12 16:00:00 2022-04-12 19:00:00 Work
2 2022-04-12 19:00:00 2022-04-12 00:00:00 Home
I want to merge over ID and over point in time (when a happens to occur in b (between start and end). I would like to end up with a data frame looking like this:
ID3 <- c("1", "1", "1", "1", "2", "2", "2")
time3<- c("2022-04-12 08:52", "2022-04-12 15:34", "2022-04-12 16:45", "2022-04-12 22:23", "2022-04-12 02:15", "2022-04-12 05:24", "2022-04-12 14:55")
activity3 <- c( "Work", "Work", "Work", "Home", "Home", "Home", "Home")
result <- data.frame(ID3, time3, activity3)
result
ID3 time3 activity3
1 2022-04-12 08:52 Work
1 2022-04-12 15:34 Work
1 2022-04-12 16:45 Work
1 2022-04-12 22:23 Home
2 2022-04-12 02:15 Home
2 2022-04-12 05:24 Home
2 2022-04-12 14:55 Home
I am very thankful for any help on this topic.
First of all, you need to change end
column of dataframe b
, lines 4 and 8. Since you're assuming that a new day is starting, day is not 2022-04-12 00:00:00
, but 2022-04-13 00:00:00
.
If you prefer, you can change lines 4 and 8, column end
from 2022-04-12 00:00:00
to 2022-04-12 23:59:59
. Both options are fine.
Once you've fixed time discrepancy, follow this code:
ID3 <- c();
time3 <- c();
activt <- c();
values <- c();
for (ide_number in unique(a$ID1)){ # let's start subseting by ID number
c <- subset(a,a$ID1==ide_number)
d <- subset(b,b$ID2==ide_number)
for (i in 1:nrow(c)){ # then, find the row that matches time condition
condition <- which(d$start<=c$time[i] & d$end>c$time[i]) # find rows
values <- append(values,condition) # rows in a vector
activt <- append(activt,d$activity[condition]) # activity vector
time3 <- append(time3,as.character(c$time[i])) # time vector
ID3 <- append(ID3,ide_number) # ID vector
}
}
out <- as.data.frame(cbind(ID3,time3,activt)) # just build your output
Output:
> out
ID3 time3 activt
1 1 2022-04-12 08:52:00 Work
2 1 2022-04-12 15:34:00 Work
3 1 2022-04-12 16:45:00 Work
4 1 2022-04-12 22:23:00 Home
5 2 2022-04-12 02:15:00 Home
6 2 2022-04-12 05:24:00 Home
7 2 2022-04-12 14:55:00 Home
>