Search code examples
rpadpadr

Having issues using Pad function to fill in date with time gaps


I am having issues using the Pad function (Padr) to fill in gaps within a time series. I have some code that downloads hourly data from a server, one day at a time for a specific time period. After each day of data has been downloaded the aim is to use pad to clear up the data and add in the time and date so it can be appropriately combined without an error.

The function downloads the data as a list and looks like the following:

 time                  temperature
2019-11-11 00:00:00          3
2019-11-11 01:00:00          4 
2019-11-11 03:00:00          5

Would like a program to automatically fill in to look like below:

 time                  temperature
2019-11-11 00:00:00          3
2019-11-11 01:00:00          4 
2019-11-11 02:00:00          NA
2019-11-11 03:00:00          5

I have used PAD in the code below to fill in the gaps, but if the data starts at 02:00:00, it starts at that timestep. When using the start_val and end_val it seems to have problems recognising the date and time, any help would be appreciated. I have tried a lot of work arounds but no luck. Baring in mind the date will be different and there is no way of knowing which hour is missing.

    if (nrow(daily$hourly) < 24) {
    daily$hourly <- daily$hourly %>% pad(daily$hourly$time, start_val = as.POSIXct('00:00:00'),end_val = as.POSIXct('23:00:00') %>% fill_by_value(value)
  }

**Update

I think the main issue is that R is not recognising that 00:00:00 is the start of a time series so it will not fill in 01:00:00 as a gap. Both solutions have worked if the gap was in a different place. ANy thoughts. See structure below.

structure(list(time = structure(c(1521936000, 1521939600, 1521943200, 
1521946800, 1521950400, 1521954000, 1521957600, 1521961200, 1521964800, 
1521968400, 1521972000, 1521975600, 1521979200, 1521982800, 1521986400, 
1521990000, 1521993600, 1521997200, 1522000800, 1522004400, 1522008000, 
1522011600, 1522015200), class = c("POSIXct", "POSIXt"), tzone = ""), 
    summary = c("Overcast", "Overcast", "Overcast", "Overcast", 
    "Overcast", "Overcast", "Overcast", "Foggy", "Mostly Cloudy", 
    "Mostly Cloudy", "Overcast", "Mostly Cloudy", "Mostly Cloudy", 
    "Mostly Cloudy", "Mostly Cloudy", "Mostly Cloudy", "Partly Cloudy", 
    "Partly Cloudy", "Partly Cloudy", "Partly Cloudy", "Partly Cloudy", 
    "Clear", "Clear"), icon = c("cloudy", "cloudy", "cloudy", 
    "cloudy", "cloudy", "cloudy", "cloudy", "fog", "partly-cloudy-day", 
    "partly-cloudy-day", "cloudy", "partly-cloudy-day", "partly-cloudy-day", 
    "partly-cloudy-day", "partly-cloudy-day", "partly-cloudy-day", 
    "partly-cloudy-day", "partly-cloudy-day", "partly-cloudy-day", 
    "partly-cloudy-night", "partly-cloudy-night", "clear-night", 
    "clear-night"), precipIntensity = c(0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L), precipProbability = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L), temperature = c(7.28, 7.3, 7.21, 7.08, 7.03, 7.02, 7.15, 
    7.19, 7.38, 7.83, 8.43, 9.35, 9.89, 10.54, 10.81, 11.07, 
    11.55, 11.31, 10.52, 9.67, 8.67, 7.94, 6.93), apparentTemperature = c(7.28, 
    7.3, 7.21, 7.08, 7.03, 7.02, 7.15, 7.19, 7.38, 7.33, 8.43, 
    9.35, 9.64, 10.54, 10.81, 11.07, 11.55, 11.31, 10.52, 9.67, 
    8.67, 7.94, 6.93), dewPoint = c(4.99, 5.07, 5.03, 4.99, 4.86, 
    5.04, 5.41, 5.6, 5.55, 5.62, 5.57, 5.79, 5.84, 5.7, 5.4, 
    5.08, 4.4, 4.2, 4.37, 4.32, 4.02, 4.06, 3.73), humidity = c(0.85, 
    0.86, 0.86, 0.87, 0.86, 0.87, 0.89, 0.9, 0.88, 0.86, 0.82, 
    0.78, 0.76, 0.72, 0.69, 0.67, 0.61, 0.62, 0.66, 0.69, 0.73, 
    0.76, 0.8), pressure = c(1005.4, 1005.7, 1006, 1006.4, 1006.7, 
    1007.2, 1007.7, 1008.6, 1009.4, 1010.3, 1010.9, 1011.6, 1011.7, 
    1012.1, 1012.2, 1012.3, 1012.4, 1012.6, 1013.3, 1013.8, 1014.5, 
    1014.8, 1015.3), windSpeed = c(0.35, 0.48, 0.55, 0.33, 0.36, 
    0.6, 0.85, 1.05, 1.29, 1.38, 0.89, 1.33, 1.39, 1.44, 1.63, 
    1.57, 1.46, 1.27, 0.57, 0.23, 0.03, 0.27, 0.2), windGust = c(0.48, 
    0.81, 0.95, 0.42, 0.44, 0.96, 1.14, 1.28, 2.03, 1.99, 1.72, 
    2.51, 2.48, 2.66, 2.48, 2.46, 2.42, 1.67, 0.65, 0.27, 0.03, 
    0.27, 0.2), windBearing = c(28L, 6L, 12L, 1L, 12L, 3L, 12L, 
    23L, 40L, 41L, 26L, 22L, 15L, 21L, 9L, 11L, 10L, 18L, 16L, 
    17L, NA, 273L, 284L), cloudCover = c(0.98, 0.98, 0.98, 0.93, 
    0.89, 0.93, 0.97, 0.94, 0.82, 0.83, 0.99, 0.75, 0.75, 0.75, 
    0.75, 0.73, 0.51, 0.49, 0.46, 0.46, 0.44, 0.1, 0), uvIndex = c(0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 2L, 2L, 3L, 3L, 3L, 3L, 2L, 
    1L, 0L, 0L, 0L, 0L, 0L, 0L), visibility = c(6.74, 6.064, 
    6.532, 6.035, 6.054, 6.006, 4.033, 3.047, 4.369, 5.512, 6.856, 
    8.129, 9.269, 9.488, 10.003, 10.003, 10.003, 10.003, 10.003, 
    10.003, 10.003, 10.003, 9.521)), row.names = c(NA, -23L), class = "data.frame")

Solution

  • You can use complete from tidyr and create an hourly sequence between min and max time

    tidyr::complete(df, time = seq(min(time), max(time), by = "1 hour"))
    
    #  time                temperature
    #  <dttm>                    <int>
    #1 2019-11-11 00:00:00           3
    #2 2019-11-11 01:00:00           4
    #3 2019-11-11 02:00:00          NA
    #4 2019-11-11 03:00:00           5
    

    data

    df <- structure(list(time = structure(c(1573401600, 1573405200, 1573412400
    ), class = c("POSIXct", "POSIXt"), tzone = ""), temperature = 3:5), 
    row.names = c(NA, -3L), class = "data.frame")