Search code examples
rmeanlubridateutcmutate

How to write Date and Time columns together in column as (yyyy-mm-dd hh:mm:ss+00) in output?


I have various text files(with per minute data) in different folders in following manner. The start date of each file differs but the structure of each file is same as below:

Date/Time   Col1   Col2   Col3....
2021-01-01T00:00:00  20  12  34...
2021-01-01T00:01:00  .....
2021-01-01T00:02:00  .....
2021-01-01T00:03:00  .....
2021-01-01T01:04:00  .....
2021-01-01T01:05:00  .....
2021-01-01T01:05:00  .....
2021-01-01T01:07:00  .....
2021-01-01T02:08:00  .....

I have calculated every 15rows mean of my data (.txt file) using a code in R mentioned later. I want Date and Time column in a particular manner specified below. Time is in UTC format.

rep_TimeZ   Col1   Col2   Col3....
2021-01-01 00:00:00+00  12  36  56
2021-01-01 00:15:00+00  34  54  43
2021-01-01 00:30:00+00  24  23  21
2021-01-01 00:45:00+00  12  36  56
2021-01-01 01:00:00+00  34  54  43
2021-01-01 01:15:00+00  24  23  21
2021-01-01 01:30:00+00  12  36  43
2021-01-01 01:45:00+00  12  36  34
2021-01-01 02:00:00+00  12  36  34 
.
.

I am using following code in R to calculate 15rows average iteratively through different folders.

library(lubridate)

mn <- df %>% separate(`Date/Time`, into = c("Date", "Time"), sep = "T")

cl <- c('C1', 'C3', 'C7')

mnf <- mn[ , cl ] %>%
  as_tibble() %>%
  group_by(group = as.integer(gl(n(), 15, n()))) %>%
  summarise(across(everything(), ~ if(mean(is.na(.x)) > 0.8) NA else 
  mean(.x, na.rm = TRUE))) 
 
mnf

write.csv(min, 'C:/Users/Alexia/Desktop/Test/15row.csv')

Can someone please help me in modifying the existing code to write Date and Time columns together in one column (e.g. yyyy-mm-dd hh:mm:ss+00) as shown above.


Solution

  • You could simply reformat the Time column, but i think it's better to do all in one go, and create the Time column in the way that you want, while you aggregate the data.

    Additionaly, I made a different approach to define the groups as.numeric(`Date/Time`) %/% (15*60) instead of as.integer(gl(n(), 15, n())). I think it makes what's happening more clear, but that's just my preference.

    df %>%
      group_by(groups = as.numeric(rep_TimeZ) %/% (15*60)) %>%
      summarise(rep_TimeZ = rep_TimeZ[1],
                across(Col1:Col3, mean)) %>%
      select(-groups)
    

    Result:

    # A tibble: 200 × 4
       rep_TimeZ            Col1  Col2  Col3
       <dttm>              <dbl> <dbl> <dbl>
     1 2021-01-01 00:00:00 0.451 0.448 0.514
     2 2021-01-01 00:15:00 0.577 0.585 0.572
     3 2021-01-01 00:30:00 0.496 0.475 0.482
     4 2021-01-01 00:45:00 0.539 0.516 0.512
     5 2021-01-01 01:00:00 0.392 0.437 0.414
     6 2021-01-01 01:15:00 0.568 0.620 0.517
     7 2021-01-01 01:30:00 0.590 0.588 0.444
     8 2021-01-01 01:45:00 0.417 0.510 0.364
     9 2021-01-01 02:00:00 0.377 0.446 0.306
    10 2021-01-01 02:15:00 0.613 0.614 0.501
    # … with 190 more rows
    

    Dummy data:

    df = tibble(rep_TimeZ = seq(as.POSIXct('2021-01-01 00:00:00'), by = 60, length.out = 3000),
                Col1 = runif(3000), Col2 = runif(3000), Col3 = runif(3000))