Search code examples
rdataframeaggregatemeanlubridate

Hourly mean of multiple variables in R data.frame?


I have the following code and am trying to find hourly mean of each variables (i.e., X,Y, and Z). my output should be a data.frame with hourlyDate column and mean hourly data of all the variables. Any way forward would be appreciated.

library(lubridate)

set.seed(123)

T <- data.frame(Datetime = seq(ymd_hms("2011-01-01 00:00:00"), to= ymd_hms("2011-12-31 00:00:00"), by = "5 min"),
                X = runif(104833, 5,10),Y = runif(104833, 5,10), Z = runif(104833, 5,10))
T$Date <- format(T$Datetime, format="%Y-%m-%d")
T$Hour <- format(T$Datetime, format = "%H")
T$Mints <- format(T$Datetime, format = "%M")

Solution

  • Try:

    library(lubridate)
    library(dplyr)
    
    set.seed(123)
    
    T <- data.frame(Datetime = seq(ymd_hms("2011-01-01 00:00:00"), to= ymd_hms("2011-12-31 00:00:00"), by = "5 min"),
                    X = runif(104833, 5,10),Y = runif(104833, 5,10), Z = runif(104833, 5,10))
    
    
    
    T %>% mutate(hourlyDate = floor_date(Datetime,unit='hour')) %>%
          select(-Datetime) %>% group_by(hourlyDate) %>% 
          summarize(across(everything(),mean)) %>%
          ungroup()
    #> `summarise()` ungrouping output (override with `.groups` argument)
    #> # A tibble: 8,737 x 4
    #>    hourlyDate              X     Y     Z
    #>    <dttm>              <dbl> <dbl> <dbl>
    #>  1 2011-01-01 00:00:00  8.00  7.90  6.90
    #>  2 2011-01-01 01:00:00  7.93  7.47  7.90
    #>  3 2011-01-01 02:00:00  7.83  6.89  7.67
    #>  4 2011-01-01 03:00:00  6.61  7.92  7.18
    #>  5 2011-01-01 04:00:00  7.27  7.20  6.48
    #>  6 2011-01-01 05:00:00  7.88  6.80  7.69
    #>  7 2011-01-01 06:00:00  7.07  8.05  7.52
    #>  8 2011-01-01 07:00:00  7.40  7.92  6.99
    #>  9 2011-01-01 08:00:00  7.97  7.76  7.26
    #> 10 2011-01-01 09:00:00  7.57  7.47  6.94
    #> # ... with 8,727 more rows
    

    Created on 2020-08-20 by the reprex package (v0.3.0)