Search code examples
rdplyr

Interpolation of values for midnight each day for irregular data


My data set consists of readings taken at irregular time intervals

df1 <- read.table(text = "DT   odczyt.1 odczyt.2
'2023-08-16 00:00:00'   482 1.5
'2023-08-16 23:55:00'   483 4.3
'2023-08-17 05:00:00'   484 1.6
'2023-08-17 23:59:45'   485 5.6
'2023-08-18 00:50:00'   486 1.5
'2023-08-18 23:50:45'   487 6.8
'2023-08-19 00:03:00'   586 1.5
'2023-08-19 05:09:45'   578 7.9
'2023-08-19 17:01:00'   521 11.5
'2023-08-19 18:01:15'   365 8.5
'2023-08-19 22:59:30'   490 1.8
'2023-08-19 23:59:45'   456 6.6
'2023-08-20 00:48:00'   478 1.5
'2023-08-20 03:03:00'   461 1.54
'2023-08-20 05:04:15'   397 1.6
'2023-08-20 19:07:00'   492 6.6
'2023-08-20 23:59:45'   493 3.8
'2023-08-21 01:00:00'   471 9.2
'2023-08-21 07:00:30'   493 8.5
'2023-08-21 13:21:45'   495 5.5", header = TRUE) %>% 
  mutate (DT = as.POSIXct(DT))

He would like to estimate the values at 00:00 each day. I was thinking about using na.approx

Expected results with one interpolated value:

df1 <- read.table(text = "DT   odczyt.1 odczyt.2
'2023-08-16 00:00:00'   482 1.5
'2023-08-16 23:55:00'   483 4.3
'2023-08-17 00:00:00'   483.016 4.256
'2023-08-17 05:00:00'   484 1.6
'2023-08-17 23:59:45'   485 5.6
'2023-08-18 00:00:00'   
'2023-08-18 00:50:00'   486 1.5
'2023-08-18 23:50:45'   487 6.8
'2023-08-19 00:00:00'   
'2023-08-19 00:03:00'   586 1.5
'2023-08-19 05:09:45'   578 7.9
'2023-08-19 17:01:00'   521 11.5
'2023-08-19 18:01:15'   365 8.5
'2023-08-19 22:59:30'   490 1.8
'2023-08-19 23:59:45'   456 6.6
'2023-08-20 00:00:00'   
'2023-08-20 00:48:00'   478 1.5
'2023-08-20 03:03:00'   461 1.54
'2023-08-20 05:04:15'   397 1.6
'2023-08-20 19:07:00'   492 6.6
'2023-08-20 23:59:45'   493 3.8
'2023-08-21 00:00:00'   
'2023-08-21 01:00:00'   471 9.2
'2023-08-21 07:00:30'   493 8.5
'2023-08-21 13:21:45'   495 5.5", header = TRUE) %>% 
  mutate (DT = as.POSIXct(DT))

Using na.approx seems not to be complicated if the dates are supplemented with 00:00


Solution

  • A possible approach:

    (fill_gaps ensures regular intervals before interpolating.)

    library(tidyverse)
    library(tsibble)
    library(zoo)
    
    
    df1 <- read.table(text = "DT   odczyt.1 odczyt.2
    '2023-08-16 00:00:00'   482 1.5
    '2023-08-16 23:55:00'   483 4.3
    '2023-08-17 05:00:00'   484 1.6
    '2023-08-17 23:59:45'   485 5.6
    '2023-08-18 00:50:00'   486 1.5
    '2023-08-18 23:50:45'   487 6.8
    '2023-08-19 00:03:00'   586 1.5
    '2023-08-19 05:09:45'   578 7.9
    '2023-08-19 17:01:00'   521 11.5
    '2023-08-19 18:01:15'   365 8.5
    '2023-08-19 22:59:30'   490 1.8
    '2023-08-19 23:59:45'   456 6.6
    '2023-08-20 00:48:00'   478 1.5
    '2023-08-20 03:03:00'   461 1.54
    '2023-08-20 05:04:15'   397 1.6
    '2023-08-20 19:07:00'   492 6.6
    '2023-08-20 23:59:45'   493 3.8
    '2023-08-21 01:00:00'   471 9.2
    '2023-08-21 07:00:30'   493 8.5
    '2023-08-21 13:21:45'   495 5.5", header = TRUE) %>% 
      mutate (DT = as.POSIXct(DT))
    
    df1 |> 
      as_tsibble(index = DT) |> 
      fill_gaps() |> 
      mutate(across(-DT, na.approx)) |> 
      filter(hour(DT) == 0, minute(DT) == 0, second(DT) == 0)
    #> # A tsibble: 6 x 3 [15s] <?>
    #>   DT                  odczyt.1 odczyt.2
    #>   <dttm>                 <dbl>    <dbl>
    #> 1 2023-08-16 00:00:00     482      1.5 
    #> 2 2023-08-17 00:00:00     483.     4.26
    #> 3 2023-08-18 00:00:00     485.     5.58
    #> 4 2023-08-19 00:00:00     562.     2.80
    #> 5 2023-08-20 00:00:00     456.     6.57
    #> 6 2023-08-21 00:00:00     493.     3.82
    

    Created on 2024-03-16 with reprex v2.1.0