Search code examples
rtimeaverage

How to calculate average of a variable by hour in R


I'm having trouble when trying to calculate the average temperature by hour.

I have a data frame with date, time (hh:mm:ss p.m./a.m.)and temperature. What I need is to extract the mean temperature by hour in order to plot daily variation of temperature.

I'm new to R, but did a try with what I know: I first tried by transforming hours into numbers, then extracting the first two characters, and then to calculate the mean but it didn't work very well. Moreover I have so many files to analize that it would be much better to have something more automated and clean than the "solution" I found.

I believe it must be a better way to calculate averages by hours in R so I've been looking for the answer in other posts here. Unfortunately I couldn't find a clear answer regarding extracting statistics from time data.

My data looks like this

          date     hour temperature
1   28/12/2013 13:03:01      41.572
2   28/12/2013 13:08:01      46.059
3   28/12/2013 13:13:01       48.55
4   28/12/2013 13:18:01      49.546
5   28/12/2013 13:23:01      49.546
6   28/12/2013 13:28:01      49.546
7   28/12/2013 13:33:01      50.044
8   28/12/2013 13:38:01      50.542
9   28/12/2013 13:43:01      50.542
10  28/12/2013 13:48:01       51.04
11  28/12/2013 13:53:01      51.538
12  28/12/2013 13:58:01      51.538
13  28/12/2013 14:03:01      50.542
14  28/12/2013 14:08:01       51.04
15  28/12/2013 14:13:01       51.04
16  28/12/2013 14:18:01      52.534
17  28/12/2013 14:23:01      53.031
18  28/12/2013 14:28:01      53.031
19  28/12/2013 14:33:01      53.031
20  28/12/2013 14:38:01      51.538
21  28/12/2013 14:43:01      53.031
22  28/12/2013 14:48:01      53.529
etc (24hs data)

And I would like R to calculate average per hour (without taking into account differences in minutes or seconds, just by hour)

Any suggestion? Thank you very much in advance!

Regards, Maria


Solution

  • It would always easier if sample data and expected output is given in the question.

    Solution with Data.table package

    require(data.table)
    data <- fread('temp.csv',sep=',') #Assuming your data is in temp.csv
    #if above step not executed, convert the data frame to data.table 
    data <- data.table(data)
    > str(data)
    Classes ‘data.table’ and 'data.frame':  12 obs. of  3 variables:
    $ date       : chr  "28/12/2013" "28/12/2013" "28/12/2013" "28/12/2013" ...
    $ hour       : chr  "13:03:01" "13:08:01" "13:13:01" "13:18:01" ...
    $ temperature: num  41.6 46.1 48.5 49.5 49.5 ...
    
    > data
          date     hour    temperature      avg
    1: 27/12/2013 13:00:00       42.99 35.78455
    2: 27/12/2013 14:00:00       65.97 35.78455
    3: 27/12/2013 15:00:00       63.57 35.78455 
    
      data[,list(avg=mean(temperature)),by=hour] #dataset is sorted by hour
        hour   avg
    1: 13:00:00 42.99
    2: 14:00:00 65.97
    3: 15:00:00 63.57
      data[,list(avg=mean(temperature)),by="date,hour"] #data set is grouped by date,then hour
            date     hour   avg
    1: 27/12/2013 13:00:00 42.99
    2: 27/12/2013 14:00:00 65.97
    3: 27/12/2013 15:00:00 63.57
    
    data[,list(avg=mean(temperature)),by=list(date,hour(as.POSIXct(data$hour, format = "%H:%M:%S")))] # to group by hour only 
         date     hour    avg
    1: 27/12/2013    1 29.530
    2: 27/12/2013    4 65.970