Search code examples
rdatefillminute

How to fill in missing dates by minute by group in R


I am attempting to fill in missing minutes from a dataframe that has different groups. I would like the missing minutes to be filled in with zeroes.

I tried to use this R - Fill missing dates by group but cannot find a way to fill in missing minutes.

Datetime            | Group | Value |
2019-01-01 00:00:00 |  1    |  5    |
2019-01-01 00:00:00 |  2    |  4    |
2019-01-01 00:00:00 |  3    |  2    | 
2019-01-01 00:01:00 |  1    |  1    |
2019-01-01 00:02:00 |  1    |  2    | 
2019-01-01 00:02:00 |  2    |  2    |
2019-01-01 00:02:00 |  3    |  1    |
2019-01-01 00:03:00 |  1    |  1    |
2019-01-01 00:03:00 |  2    |  2    |
2019-01-01 00:04:00 |  1    |  1    |

I would like the final table to look like this -

Datetime            | Group | Value |
2019-01-01 00:00:00 |  1    |  5    |
2019-01-01 00:00:00 |  2    |  4    |
2019-01-01 00:00:00 |  3    |  2    | 
2019-01-01 00:01:00 |  1    |  1    |
2019-01-01 00:01:00 |  2    |  0    | 
2019-01-01 00:01:00 |  3    |  0    |
2019-01-01 00:02:00 |  1    |  2    |
2019-01-01 00:02:00 |  2    |  2    |
2019-01-01 00:02:00 |  3    |  1    |
2019-01-01 00:03:00 |  1    |  1    |
2019-01-01 00:03:00 |  2    |  2    |
2019-01-01 00:03:00 |  3    |  0    |
2019-01-01 00:04:00 |  1    |  1    |
2019-01-01 00:04:00 |  2    |  0    |
2019-01-01 00:04:00 |  3    |  0    |

Solution

  • library(dplyr); library(padr)
    df %>%
      pad(group = 'Group', interval = 'min') %>%   # Explicitly fill by 1 min
      fill_by_value(Value)
    
    #pad applied on the interval: min
    #              Datetime Group Value
    #1  2019-01-01 00:00:00     1     5
    #2  2019-01-01 00:01:00     1     1
    #3  2019-01-01 00:02:00     1     2
    #4  2019-01-01 00:03:00     1     1
    #5  2019-01-01 00:04:00     1     1
    #6  2019-01-01 00:00:00     2     4
    #7  2019-01-01 00:01:00     2     0    # added
    #8  2019-01-01 00:02:00     2     2
    #9  2019-01-01 00:03:00     2     2
    #10 2019-01-01 00:00:00     3     2
    #11 2019-01-01 00:01:00     3     0    # added
    #12 2019-01-01 00:02:00     3     1
    

    Data

    df <- read.table(
      header = T,
      stringsAsFactors = F, sep = "|",
      text = "Datetime            | Group | Value
    2019-01-01 00:00:00 |  1    |  5  
    2019-01-01 00:00:00 |  2    |  4    
    2019-01-01 00:00:00 |  3    |  2     
    2019-01-01 00:01:00 |  1    |  1  
    2019-01-01 00:02:00 |  1    |  2     
    2019-01-01 00:02:00 |  2    |  2    
    2019-01-01 00:02:00 |  3    |  1    
    2019-01-01 00:03:00 |  1    |  1    
    2019-01-01 00:03:00 |  2    |  2    
    2019-01-01 00:04:00 |  1    |  1"
    ) 
    df$Datetime = lubridate::ymd_hms(df$Datetime)