Search code examples
rtimedata-cleaning

Linking values occurring within the same time window in R


Problem: need to add values from one dataframe to another depending on the time window in which each row occurs.

I have one dataframe with a list of singular events like this:

  Ind       Date     Time Event
1 FAU 15/11/2016 06:40:43     A
2 POR 15/11/2016 12:26:51     V
3 POR 15/11/2016 14:52:53     B
4 MAM 20/11/2016 08:12:19     G
5 SUR 03/12/2016 13:51:18     A
6 SUR 14/12/2016 07:47:06     V

And a second data frame with ongoing, continuous events linked like this:

         Date     Time Event
1  15/11/2016 06:56:48     1
2  15/11/2016 06:59:40     2
3  15/11/2016 07:27:36     3
4  15/11/2016 07:29:10     4
5  15/11/2016 07:34:51     5
6  15/11/2016 07:35:10     6
7  15/11/2016 07:37:19     7
8  15/11/2016 07:39:55     8
9  15/11/2016 07:51:59     9
10 15/11/2016 08:00:13    10
11 15/11/2016 08:08:01    11
12 15/11/2016 08:13:21    12
13 15/11/2016 08:16:21    13
14 15/11/2016 12:14:48    14
15 15/11/2016 12:16:58    15
16 15/11/2016 12:51:22    16
17 15/11/2016 12:52:09    17
18 15/11/2016 13:26:29    18
19 15/11/2016 13:26:55    19
20 15/11/2016 13:34:14    20
21 15/11/2016 13:50:41    21
22 15/11/2016 13:53:25    22
23 15/11/2016 14:15:17    23
24 15/11/2016 14:54:49    24

Question: how can I combine these so that for the singular events we can see during which continuous events they occurred, for example, something like this:

Ind       Date     Time Eventx   Eventy
1 FAU 15/11/2017 06:40:43     A   1
2 POR 15/11/2017 12:26:51     V   15
3 POR 15/11/2017 14:52:53     B   23

Many thanks


Solution

  • I can provide you with a data.table solution. The only issue is that I had to move the start of the first event in the second dataframe to an earlier date, since it was after the starting time of the first event of the first dataframe. You'll need the additional packages data.table and lubridate.

    library(data.table)
    library(lubridate)
    dt1 <- data.table(df1)
    dt2 <- data.table(df2)
    
    dt1[, Date.Time := as.POSIXct(strptime(paste(Date, Time, sep = " "), "%d/%m/%Y %H:%M:%S"))]
    dt2[, Date.Time := as.POSIXct(strptime(paste(Date, Time, sep = " "), "%d/%m/%Y %H:%M:%S"))]
    
    
    # Create the start and end time columns in the second data.table
    dt2[, `:=`(Start.Time = Date.Time
            , End.Time = shift(Date.Time, n = 1L, fill = NA, type = "lead"))]
    
    # Change the start date to an earlier one
    dt2[Event == 1,`:=`(Start.Time = Start.Time - days(1)) ]
    
    # Merge on multiple conditions and the selection of the relevant columns
    dt2[dt1, on=.(Start.Time < Date.Time
                  , End.Time > Date.Time)
                  , nomatch = 0L][,.(Ind
                       , Date
                       , Time
                       , Eventx = i.Event
                       , Eventy = Event)]
    # Output of the last merge
       Ind       Date     Time Eventx Eventy
    1: FAU 15/11/2016 06:56:48      A      1
    2: POR 15/11/2016 12:16:58      V     15
    3: POR 15/11/2016 14:15:17      B     23