Search code examples
rdatetemperature

Conglomorating hourly temperature data in R


My goal is to find the minimum and maximum daily temperatures and add them to a data frame. My current data frame looks like the following:

ROW DATE_TIME  TEMP (DEG C)
1   5/1/1999   4.6
2   5/1/1999   3.8
3   5/1/1999   2.9

I am attempting to get the daily range of temperature using this data, but the main issue I run into is having "non-standard" dates. The dataset I'm using is several thousand data points long, so I would like to have a code that does max-min for every 24 rows in order to get the daily variation in temperature.

Thank you!


Solution

  • If you want to calculate it using a running window you can use the function gtools::running() and set the by() and width() arguments to 24.

    require(tidyverse)
    require(gtools)
    
    set.seed(123)
    df <- data.frame(row = c(seq(1, 24*5, by = 1)), 
                     date = as.Date(c(
                       rep(c("02/25/92"), 24), 
                       rep(c("02/26/92"), 24),
                       rep(c("02/27/92"), 24),
                       rep(c("02/28/92"), 24), 
                       rep(c("02/29/92"), 24)), 
                       format = "%m/%d/%y"),
                     temp = rnorm(24*5, mean = 5, sd = 5)) 
    
    #Function to calculate the min. and max. of a vector/column 
    MinMaxFunction <- function(x) {
      return(data.frame(min = min(x, na.rm = TRUE), 
                 max = max(x, na.rm = TRUE)))
    }
    
    #Calculating the running min. max. 
    dfRunningMean <- running(df$temp,
                             fun = MinMaxFunction,
                             by = 24, 
                             width = 24) %>%
      t() %>% 
      as.data.frame()
    
    dfRunningMean
    
                 min      max
    1:24   -4.833086 13.93457
    25:48  -3.433467 15.84478
    49:72  -6.545844 15.25042
    73:96  -1.103589 11.80326
    97:120  -3.33971 15.93666
    

    Or, you can do it with the tidyverse approach, and calculate the min./max. for each date.

    require(tidyverse)
    
    
    df %>% 
      group_by(date) %>% 
      summarise(min = min(temp, na.rm = TRUE), 
                max = max(temp, na.rm = TRUE))
    
      date         min   max
      <date>     <dbl> <dbl>
    1 1992-02-25 -4.83  13.9
    2 1992-02-26 -3.43  15.8
    3 1992-02-27 -6.55  15.3
    4 1992-02-28 -1.10  11.8
    5 1992-02-29 -3.34  15.9