I have various text files(with per minute data) in different folders in following manner. The start date of each file differs but the structure of each file is same as below:
Date/Time Col1 Col2 Col3....
2021-01-01T00:00:00 20 12 34...
2021-01-01T00:01:00 .....
2021-01-01T00:02:00 .....
2021-01-01T00:03:00 .....
2021-01-01T01:04:00 .....
2021-01-01T01:05:00 .....
2021-01-01T01:05:00 .....
2021-01-01T01:07:00 .....
2021-01-01T02:08:00 .....
I have calculated every 15rows mean of my data (.txt file) using a code in R mentioned later. I want Date and Time column in a particular manner specified below. Time is in UTC format.
rep_TimeZ Col1 Col2 Col3....
2021-01-01 00:00:00+00 12 36 56
2021-01-01 00:15:00+00 34 54 43
2021-01-01 00:30:00+00 24 23 21
2021-01-01 00:45:00+00 12 36 56
2021-01-01 01:00:00+00 34 54 43
2021-01-01 01:15:00+00 24 23 21
2021-01-01 01:30:00+00 12 36 43
2021-01-01 01:45:00+00 12 36 34
2021-01-01 02:00:00+00 12 36 34
.
.
I am using following code in R to calculate 15rows average iteratively through different folders.
library(lubridate)
mn <- df %>% separate(`Date/Time`, into = c("Date", "Time"), sep = "T")
cl <- c('C1', 'C3', 'C7')
mnf <- mn[ , cl ] %>%
as_tibble() %>%
group_by(group = as.integer(gl(n(), 15, n()))) %>%
summarise(across(everything(), ~ if(mean(is.na(.x)) > 0.8) NA else
mean(.x, na.rm = TRUE)))
mnf
write.csv(min, 'C:/Users/Alexia/Desktop/Test/15row.csv')
Can someone please help me in modifying the existing code to write Date and Time columns together in one column (e.g. yyyy-mm-dd hh:mm:ss+00) as shown above.
You could simply reformat the Time column, but i think it's better to do all in one go, and create the Time column in the way that you want, while you aggregate the data.
Additionaly, I made a different approach to define the groups as.numeric(`Date/Time`) %/% (15*60)
instead of as.integer(gl(n(), 15, n()))
. I think it makes what's happening more clear, but that's just my preference.
df %>%
group_by(groups = as.numeric(rep_TimeZ) %/% (15*60)) %>%
summarise(rep_TimeZ = rep_TimeZ[1],
across(Col1:Col3, mean)) %>%
select(-groups)
Result:
# A tibble: 200 × 4
rep_TimeZ Col1 Col2 Col3
<dttm> <dbl> <dbl> <dbl>
1 2021-01-01 00:00:00 0.451 0.448 0.514
2 2021-01-01 00:15:00 0.577 0.585 0.572
3 2021-01-01 00:30:00 0.496 0.475 0.482
4 2021-01-01 00:45:00 0.539 0.516 0.512
5 2021-01-01 01:00:00 0.392 0.437 0.414
6 2021-01-01 01:15:00 0.568 0.620 0.517
7 2021-01-01 01:30:00 0.590 0.588 0.444
8 2021-01-01 01:45:00 0.417 0.510 0.364
9 2021-01-01 02:00:00 0.377 0.446 0.306
10 2021-01-01 02:15:00 0.613 0.614 0.501
# … with 190 more rows
Dummy data:
df = tibble(rep_TimeZ = seq(as.POSIXct('2021-01-01 00:00:00'), by = 60, length.out = 3000),
Col1 = runif(3000), Col2 = runif(3000), Col3 = runif(3000))