Search code examples
rdataframedatetimedplyr

Insert appropriate number of rows into data frame based on date differences


I have a data frame as below

df1 <- read.table(text = "entity_id state        last_changed DT.diff
sensor.kincony02_temperature03  20.4 '2025-02-04 23:00:15'  15.188
sensor.kincony02_temperature03  20.3 '2025-02-04 23:08:15' 479.849
sensor.kincony02_temperature03  20.2 '2025-02-04 23:10:15' 120.115
sensor.kincony02_temperature03  20.3 '2025-02-04 23:15:15' 300.136
sensor.kincony02_temperature03  20.4 '2025-02-04 23:18:15' 180.020
sensor.kincony02_temperature03  20.5 '2025-02-04 23:21:15' 180.020
sensor.kincony02_temperature03  20.6 '2025-02-04 23:22:15'  59.904
sensor.kincony02_temperature03  20.7 '2025-02-04 23:23:15'  59.904
sensor.kincony02_temperature03  20.8 '2025-02-04 23:25:15' 120.115
sensor.kincony02_temperature03  20.9 '2025-02-04 23:27:15' 119.809
sensor.kincony02_temperature03  21.0 '2025-02-04 23:30:15' 179.979
sensor.kincony02_temperature03  21.1 '2025-02-04 23:31:15'  60.252
sensor.kincony02_temperature03  21.2 '2025-02-04 23:35:15' 239.921
sensor.kincony02_temperature03  21.3 '2025-02-04 23:46:15' 659.865
sensor.kincony02_temperature03  21.2 '2025-02-04 23:47:15'  60.008
sensor.kincony02_temperature03  21.1 '2025-02-04 23:51:15' 240.025
sensor.kincony02_temperature03  21.2 '2025-02-04 23:53:15' 120.218
sensor.kincony02_temperature03  21.1 '2025-02-04 23:54:15'  59.903
sensor.kincony02_temperature03  21.0 '2025-02-05 00:02:15' 479.803
sensor.kincony02_temperature03  20.9 '2025-02-05 00:06:15' 239.999
sensor.kincony02_temperature03  20.8 '2025-02-05 00:11:15' 300.007
sensor.kincony02_temperature03  20.7 '2025-02-05 00:13:15' 119.997
sensor.kincony02_temperature03  20.6 '2025-02-05 00:14:15'  60.008
sensor.kincony02_temperature03  20.5 '2025-02-05 00:15:15'  60.002
sensor.kincony02_temperature03  20.4 '2025-02-05 00:17:15' 119.999
sensor.kincony02_temperature03  20.3 '2025-02-05 00:19:15' 119.996
sensor.kincony02_temperature03  20.2 '2025-02-05 00:20:15'  59.998
sensor.kincony02_temperature03  20.1 '2025-02-05 00:24:15' 240.009
sensor.kincony02_temperature03  20.0 '2025-02-05 00:27:15' 179.997", header = TRUE) 

I need to add rows so that the data frame contains values ​​for each minute. Currently, the device only records the moment of change of the measured value, which means that between the recorded times the measured value did not change.

The expected data frame for the first few rows should look like this:

df1.1 <- read.table(text = "entity_id state        last_changed 
sensor.kincony02_temperature03  20.4 '2025-02-04 23:00:15'
sensor.kincony02_temperature03  20.4 '2025-02-04 23:01:15'
sensor.kincony02_temperature03  20.4 '2025-02-04 23:02:15'
sensor.kincony02_temperature03  20.4 '2025-02-04 23:03:15'
sensor.kincony02_temperature03  20.4 '2025-02-04 23:04:15'
sensor.kincony02_temperature03  20.4 '2025-02-04 23:05:15'
sensor.kincony02_temperature03  20.4 '2025-02-04 23:06:15'
sensor.kincony02_temperature03  20.4 '2025-02-04 23:07:15'
sensor.kincony02_temperature03  20.3 '2025-02-04 23:08:15'
sensor.kincony02_temperature03  20.3 '2025-02-04 23:09:15'
sensor.kincony02_temperature03  20.2 '2025-02-04 23:10:15'
sensor.kincony02_temperature03  20.2 '2025-02-04 23:11:15'
sensor.kincony02_temperature03  20.2 '2025-02-04 23:12:15'
sensor.kincony02_temperature03  20.2 '2025-02-04 23:13:15'
sensor.kincony02_temperature03  20.2 '2025-02-04 23:14:15'
sensor.kincony02_temperature03  20.3 '2025-02-04 23:15:15'", header = TRUE) 

Solution

  • You can first format date-times as.POSIXct, then merge with a data.frame of a date-time sequence over the range of last_changed variable, and finally—using zoo::na.locf—replacing each NA with the most recent non-NA prior to it.

    > df1 |> 
    +   transform(last_changed=last_changed |> as.POSIXct()) |> 
    +   merge(
    +     data.frame(last_changed=do.call('seq', c(as.list(range(df1$last_changed)), 
    +                                              by='min'))), all=TRUE) |> 
    +   transform(state=zoo::na.locf(state), 
    +             entity_id=zoo::na.locf(entity_id), 
    +             DT.diff=NULL)
              last_changed                      entity_id state
    1  2025-02-04 23:00:15 sensor.kincony02_temperature03  20.4
    2  2025-02-04 23:01:15 sensor.kincony02_temperature03  20.4
    3  2025-02-04 23:02:15 sensor.kincony02_temperature03  20.4
    4  2025-02-04 23:03:15 sensor.kincony02_temperature03  20.4
    5  2025-02-04 23:04:15 sensor.kincony02_temperature03  20.4
    6  2025-02-04 23:05:15 sensor.kincony02_temperature03  20.4
    7  2025-02-04 23:06:15 sensor.kincony02_temperature03  20.4
    8  2025-02-04 23:07:15 sensor.kincony02_temperature03  20.4
    9  2025-02-04 23:08:15 sensor.kincony02_temperature03  20.3
    10 2025-02-04 23:09:15 sensor.kincony02_temperature03  20.3
    11 2025-02-04 23:10:15 sensor.kincony02_temperature03  20.2
    12 2025-02-04 23:11:15 sensor.kincony02_temperature03  20.2
    13 2025-02-04 23:12:15 sensor.kincony02_temperature03  20.2
    [...]