I am real close but I am missing the last step, could you tell me what I need to change below?
My dataset (mydata) looks like this (the first few entries are missing):
date_daily station_date air_min air_avg air_max rain_mm temp_davg_c temp_dmax_c temp_dmin_c date months years
1 2006/01/01 <NA> <NA> <NA> <NA> NA NA NA NA 2006-01-01 01 2006
2 2006/01/02 <NA> <NA> <NA> <NA> NA NA NA NA 2006-01-02 01 2006
3 2006/01/03 <NA> <NA> <NA> <NA> NA NA NA NA 2006-01-03 01 2006
4 2006/01/04 <NA> <NA> <NA> <NA> NA NA NA NA 2006-01-04 01 2006
5 2006/01/05 <NA> <NA> <NA> <NA> NA NA NA NA 2006-01-05 01 2006
6 2006/01/06 <NA> <NA> <NA> <NA> NA NA NA NA 2006-01-06 01 2006
The daily data ranges from 2006 to 2016, so I am interested in the average monthly rain as well as average days per month with rain > 2mm.
So I tried the following:
library(dplyr)
stats <- mydata %>% group_by(months) %>%
summarise(days_w_rain = length(rain_mm[!is.na(rain_mm)]),
avg_rain = mean(rain_mm, na.rm=TRUE),
days_2mm = sum(rain_mm > 2, na.rm=TRUE),
p2mm = days_2mm / days_w_rain*100)
which gives me the following:
# A tibble: 12 x 5
months days_w_rain avg_rain days_2mm p2mm
<chr> <int> <dbl> <int> <dbl>
1 01 304 11.8078520 148 48.684211
2 02 310 7.4946903 125 40.322581
3 03 334 9.2680090 156 46.706587
4 04 327 3.7469113 92 28.134557
5 05 310 1.6073032 39 12.580645
6 06 300 1.1322100 39 13.000000
7 07 310 0.6212226 25 8.064516
8 08 333 1.6641862 47 14.114114
9 09 308 4.2796169 71 23.051948
10 10 310 10.7842742 155 50.000000
11 11 300 18.4062900 189 63.000000
12 12 287 20.6697038 155 54.006969
this is the correct number for the average rain per month however it gives me the total number of days in all Januaries that have rain (i.e. 304 days), instead of an average per the month of January, same with the amount of days that have >2mm of rain (i.e. 148 days over all months of January from 2006 to 2016. I want the average per month and not the total).
I am sure it is an easy fix but I don't know how to do it? any suggestions are appreciated cheers Sandra
Assuming you have data for all the years (2006 to 2016) this will give the desired output:
stats <- mydata %>% group_by(months) %>%
summarise(days_w_rain = length(rain_mm[!is.na(rain_mm)]/length(unique(years))),
avg_rain = mean(rain_mm, na.rm=TRUE),
days_2mm = sum(rain_mm > 2, na.rm=TRUE)/length(unique(years)),
p2mm = days_2mm / days_w_rain*100)
This is a quick fix. Just dividing by number of years and you have the average.