I have count data of species at multiple sites with information of site, year, month and day. Within this data, for several species there are multiple entries on certain days. For example, on 3rd of January, 2016, for species A1 there are two entries i.e. 10 and 20. At first step, I want to take the maximum value of this day i.e. 20. In the second step, if there are more than one day sampled per month per site then I want to take the mean per month. Example is given below
species site year month day total
A1 GG 2016 1 3 10
A1 GG 2016 1 3 20
A1 GG 2016 1 4 22
A2 GG 2016 1 5 32
A2 GG 2016 1 6 34
A3 GG 2016 1 9 23
should look like something like this
species site year month day total
A1 GG 2016 1 3.5 21
A2 GG 2016 1 5.5 33
A3 GG 2016 1 9 23
We group by the first five columns i.e. 'species', 'site', 'year', 'month', 'day', summarise
to get the max
of 'total', then group by without the 'day' and get the mean
of 'day' and 'total'
library(dplyr)
df1 %>%
group_by_at(names(.)[1:5]) %>%
summarise(total = max(total)) %>%
group_by_at(names(.)[1:4]) %>%
summarise_all(mean)
# A tibble: 3 x 6
# Groups: species, site, year [?]
# species site year month day total
# <chr> <chr> <int> <int> <dbl> <dbl>
#1 A1 GG 2016 1 3.50 21.0
#2 A2 GG 2016 1 5.50 33.0
#3 A3 GG 2016 1 9.00 23.0