I have a data set with monthly results for each site. I need to delete any sites that don't have at least one sample from each season.
An example of the data is below:
df <- data.frame(site = c('D', 'D', 'D', 'D', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'B', 'B', 'B'),
result = c('1', '2', '1.5', '3', '1.8', '7', '3.2', '4', '1','1.1', '3', '3.3', '2', '5', '4'),
season = c('w', 'sp', 'su', 'a', 'sp', 'sp', 'sp', 'su', 'a','a', 'w', 'w', 'sp', 'w', 's')
In this case, all the data for site D and A would be retained as they have at least 1 sample per season, but all the data for site B would be deleted.
I am struggling with the logic steps of how to do this and would appreciate some pointers please. I am doing this in R. I think I need to group_by site but then I don't know what I should do next.
library(dplyr)
df %>%
group_by(site) %>%
filter(length(unique(season)) == 4) %>%
ungroup()
output:
# A tibble: 12 x 3
site result season
<chr> <chr> <chr>
1 D 1 w
2 D 2 sp
3 D 1.5 su
4 D 3 a
5 A 1.8 sp
6 A 7 sp
7 A 3.2 sp
8 A 4 su
9 A 1 a
10 A 1.1 a
11 A 3 w
12 A 3.3 w