I am attempting to fill in missing minutes from a dataframe that has different groups. I would like the missing minutes to be filled in with zeroes.
I tried to use this R - Fill missing dates by group but cannot find a way to fill in missing minutes.
Datetime | Group | Value |
2019-01-01 00:00:00 | 1 | 5 |
2019-01-01 00:00:00 | 2 | 4 |
2019-01-01 00:00:00 | 3 | 2 |
2019-01-01 00:01:00 | 1 | 1 |
2019-01-01 00:02:00 | 1 | 2 |
2019-01-01 00:02:00 | 2 | 2 |
2019-01-01 00:02:00 | 3 | 1 |
2019-01-01 00:03:00 | 1 | 1 |
2019-01-01 00:03:00 | 2 | 2 |
2019-01-01 00:04:00 | 1 | 1 |
I would like the final table to look like this -
Datetime | Group | Value |
2019-01-01 00:00:00 | 1 | 5 |
2019-01-01 00:00:00 | 2 | 4 |
2019-01-01 00:00:00 | 3 | 2 |
2019-01-01 00:01:00 | 1 | 1 |
2019-01-01 00:01:00 | 2 | 0 |
2019-01-01 00:01:00 | 3 | 0 |
2019-01-01 00:02:00 | 1 | 2 |
2019-01-01 00:02:00 | 2 | 2 |
2019-01-01 00:02:00 | 3 | 1 |
2019-01-01 00:03:00 | 1 | 1 |
2019-01-01 00:03:00 | 2 | 2 |
2019-01-01 00:03:00 | 3 | 0 |
2019-01-01 00:04:00 | 1 | 1 |
2019-01-01 00:04:00 | 2 | 0 |
2019-01-01 00:04:00 | 3 | 0 |
library(dplyr); library(padr)
df %>%
pad(group = 'Group', interval = 'min') %>% # Explicitly fill by 1 min
fill_by_value(Value)
#pad applied on the interval: min
# Datetime Group Value
#1 2019-01-01 00:00:00 1 5
#2 2019-01-01 00:01:00 1 1
#3 2019-01-01 00:02:00 1 2
#4 2019-01-01 00:03:00 1 1
#5 2019-01-01 00:04:00 1 1
#6 2019-01-01 00:00:00 2 4
#7 2019-01-01 00:01:00 2 0 # added
#8 2019-01-01 00:02:00 2 2
#9 2019-01-01 00:03:00 2 2
#10 2019-01-01 00:00:00 3 2
#11 2019-01-01 00:01:00 3 0 # added
#12 2019-01-01 00:02:00 3 1
Data
df <- read.table(
header = T,
stringsAsFactors = F, sep = "|",
text = "Datetime | Group | Value
2019-01-01 00:00:00 | 1 | 5
2019-01-01 00:00:00 | 2 | 4
2019-01-01 00:00:00 | 3 | 2
2019-01-01 00:01:00 | 1 | 1
2019-01-01 00:02:00 | 1 | 2
2019-01-01 00:02:00 | 2 | 2
2019-01-01 00:02:00 | 3 | 1
2019-01-01 00:03:00 | 1 | 1
2019-01-01 00:03:00 | 2 | 2
2019-01-01 00:04:00 | 1 | 1"
)
df$Datetime = lubridate::ymd_hms(df$Datetime)