I have some time series data where values are given every 15 minutes. For this dataset we can assume that the values do not change within that 15 minute period; therefore if the value was 1 at 8:15am, and 2 at 8:30am, then we assume that from 8:15-8:30 the value was 1, and from 8:30-8:45 the value was 2.
Now I want to calculate the monthly total of minutes when values were greater than 5.
My dataframe looks something like this (Except with 15+ columns of values).
# Create a, b, c, d variables
a <- c(
"06-25-2021 08:00:00 AM",
"06-25-2021 08:15:00 AM",
"06-25-2021 08:30:00 AM",
"06-25-2021 08:45:00 AM",
"07-25-2021 08:45:00 AM",
"07-25-2021 09:00:00 AM",
"08-25-2021 08:45:00 AM",
"08-25-2021 09:00:00 AM",
"09-25-2021 09:15:00 AM",
"09-25-2021 09:30:00 AM"
)
b = c(4, 5, 8, NA, 4, 5, NA, 7, 7, 6)
c = c(6, 10, 8, NA, 8, 5, NA, 8, 7, 2)
d = c(1, 3 ,NA, 6, 4, 8, 2, 4, NA, 10)
df =
tibble(a, b, c, d)
df$a = as.POSIXlt(df$a, format = "%m-%d-%Y%H:%M:%S", tz = 'EST')
but I want it to look like this
Name = c("b", "c", "d")
June = c(15, 45, 15 )
Jul = c(NA, 15, 15)
Aug = c(15, 15, NA)
Sept = c(45, 30, 15)
df_2 = tibble (Name,June,Jul,Aug,Sept)
I'm not sure how to sum
and filter
for this when it's a time series. Anyone have any suggestions?
Based on the description - perhaps we replace
the values in columns 'b' to 'd' that are less than or equal to 5 will NA
, reshape to 'long' format with pivot_longer
, get the month
from the 'a' column and reshape back to 'wide' with pivot_wider
library(dplyr)
library(tidyr)
df %>%
mutate(across(2:4, ~ replace(., . <=5, NA))) %>%
pivot_longer(cols = b:d, names_to = 'Name', values_drop_na = TRUE) %>%
mutate(a = format(a, '%b')) %>%
pivot_wider(names_from = a, values_from = value,
values_fill = 0, values_fn = list(value = function(x) length(x) * 15))