Search code examples
rdataframedplyrmean

Average count per group


I am real close but I am missing the last step, could you tell me what I need to change below?

My dataset (mydata) looks like this (the first few entries are missing):

date_daily station_date air_min air_avg air_max rain_mm temp_davg_c temp_dmax_c temp_dmin_c       date months years
1 2006/01/01         <NA>    <NA>    <NA>    <NA>      NA          NA          NA          NA 2006-01-01     01  2006
2 2006/01/02         <NA>    <NA>    <NA>    <NA>      NA          NA          NA          NA 2006-01-02     01  2006
3 2006/01/03         <NA>    <NA>    <NA>    <NA>      NA          NA          NA          NA 2006-01-03     01  2006
4 2006/01/04         <NA>    <NA>    <NA>    <NA>      NA          NA          NA          NA 2006-01-04     01  2006
5 2006/01/05         <NA>    <NA>    <NA>    <NA>      NA          NA          NA          NA 2006-01-05     01  2006
6 2006/01/06         <NA>    <NA>    <NA>    <NA>      NA          NA          NA          NA 2006-01-06     01  2006

The daily data ranges from 2006 to 2016, so I am interested in the average monthly rain as well as average days per month with rain > 2mm.

So I tried the following:

library(dplyr)
stats <- mydata %>% group_by(months) %>%
  summarise(days_w_rain = length(rain_mm[!is.na(rain_mm)]),
            avg_rain = mean(rain_mm, na.rm=TRUE),
            days_2mm = sum(rain_mm > 2, na.rm=TRUE),
            p2mm = days_2mm / days_w_rain*100)

which gives me the following:

# A tibble: 12 x 5
   months days_w_rain   avg_rain days_2mm      p2mm
    <chr>       <int>      <dbl>    <int>     <dbl>
 1     01         304 11.8078520      148 48.684211
 2     02         310  7.4946903      125 40.322581
 3     03         334  9.2680090      156 46.706587
 4     04         327  3.7469113       92 28.134557
 5     05         310  1.6073032       39 12.580645
 6     06         300  1.1322100       39 13.000000
 7     07         310  0.6212226       25  8.064516
 8     08         333  1.6641862       47 14.114114
 9     09         308  4.2796169       71 23.051948
10     10         310 10.7842742      155 50.000000
11     11         300 18.4062900      189 63.000000
12     12         287 20.6697038      155 54.006969

this is the correct number for the average rain per month however it gives me the total number of days in all Januaries that have rain (i.e. 304 days), instead of an average per the month of January, same with the amount of days that have >2mm of rain (i.e. 148 days over all months of January from 2006 to 2016. I want the average per month and not the total).

I am sure it is an easy fix but I don't know how to do it? any suggestions are appreciated cheers Sandra


Solution

  • Assuming you have data for all the years (2006 to 2016) this will give the desired output:

     stats <- mydata %>% group_by(months) %>%
              summarise(days_w_rain = length(rain_mm[!is.na(rain_mm)]/length(unique(years))),
                        avg_rain = mean(rain_mm, na.rm=TRUE),
                        days_2mm = sum(rain_mm > 2, na.rm=TRUE)/length(unique(years)),
                        p2mm = days_2mm / days_w_rain*100)
    

    This is a quick fix. Just dividing by number of years and you have the average.