I'm having trouble when trying to calculate the average temperature by hour.
I have a data frame with date, time (hh:mm:ss p.m./a.m.)and temperature. What I need is to extract the mean temperature by hour in order to plot daily variation of temperature.
I'm new to R, but did a try with what I know: I first tried by transforming hours into numbers, then extracting the first two characters, and then to calculate the mean but it didn't work very well. Moreover I have so many files to analize that it would be much better to have something more automated and clean than the "solution" I found.
I believe it must be a better way to calculate averages by hours in R so I've been looking for the answer in other posts here. Unfortunately I couldn't find a clear answer regarding extracting statistics from time data.
My data looks like this
date hour temperature
1 28/12/2013 13:03:01 41.572
2 28/12/2013 13:08:01 46.059
3 28/12/2013 13:13:01 48.55
4 28/12/2013 13:18:01 49.546
5 28/12/2013 13:23:01 49.546
6 28/12/2013 13:28:01 49.546
7 28/12/2013 13:33:01 50.044
8 28/12/2013 13:38:01 50.542
9 28/12/2013 13:43:01 50.542
10 28/12/2013 13:48:01 51.04
11 28/12/2013 13:53:01 51.538
12 28/12/2013 13:58:01 51.538
13 28/12/2013 14:03:01 50.542
14 28/12/2013 14:08:01 51.04
15 28/12/2013 14:13:01 51.04
16 28/12/2013 14:18:01 52.534
17 28/12/2013 14:23:01 53.031
18 28/12/2013 14:28:01 53.031
19 28/12/2013 14:33:01 53.031
20 28/12/2013 14:38:01 51.538
21 28/12/2013 14:43:01 53.031
22 28/12/2013 14:48:01 53.529
etc (24hs data)
And I would like R to calculate average per hour (without taking into account differences in minutes or seconds, just by hour)
Any suggestion? Thank you very much in advance!
Regards, Maria
It would always easier if sample data and expected output is given in the question.
Solution with Data.table package
require(data.table)
data <- fread('temp.csv',sep=',') #Assuming your data is in temp.csv
#if above step not executed, convert the data frame to data.table
data <- data.table(data)
> str(data)
Classes ‘data.table’ and 'data.frame': 12 obs. of 3 variables:
$ date : chr "28/12/2013" "28/12/2013" "28/12/2013" "28/12/2013" ...
$ hour : chr "13:03:01" "13:08:01" "13:13:01" "13:18:01" ...
$ temperature: num 41.6 46.1 48.5 49.5 49.5 ...
> data
date hour temperature avg
1: 27/12/2013 13:00:00 42.99 35.78455
2: 27/12/2013 14:00:00 65.97 35.78455
3: 27/12/2013 15:00:00 63.57 35.78455
data[,list(avg=mean(temperature)),by=hour] #dataset is sorted by hour
hour avg
1: 13:00:00 42.99
2: 14:00:00 65.97
3: 15:00:00 63.57
data[,list(avg=mean(temperature)),by="date,hour"] #data set is grouped by date,then hour
date hour avg
1: 27/12/2013 13:00:00 42.99
2: 27/12/2013 14:00:00 65.97
3: 27/12/2013 15:00:00 63.57
data[,list(avg=mean(temperature)),by=list(date,hour(as.POSIXct(data$hour, format = "%H:%M:%S")))] # to group by hour only
date hour avg
1: 27/12/2013 1 29.530
2: 27/12/2013 4 65.970