Search code examples
rdplyrgroup-bydata-wrangling

How can I check whether a group contains the correct number of observations in R?


I have a data set with monthly results for each site. I need to delete any sites that don't have at least one sample from each season.

An example of the data is below:

df <- data.frame(site = c('D', 'D', 'D', 'D', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'B', 'B', 'B'),
                 result = c('1', '2', '1.5', '3', '1.8', '7', '3.2', '4', '1','1.1', '3', '3.3', '2', '5', '4'),
                 season = c('w', 'sp', 'su', 'a', 'sp', 'sp', 'sp', 'su', 'a','a', 'w', 'w', 'sp', 'w', 's')

In this case, all the data for site D and A would be retained as they have at least 1 sample per season, but all the data for site B would be deleted.

I am struggling with the logic steps of how to do this and would appreciate some pointers please. I am doing this in R. I think I need to group_by site but then I don't know what I should do next.


Solution

  • library(dplyr)
    
    df %>%
      group_by(site) %>%
      filter(length(unique(season)) == 4) %>%
      ungroup()
    

    output:

    # A tibble: 12 x 3
       site  result season
       <chr> <chr>  <chr> 
     1 D     1      w     
     2 D     2      sp    
     3 D     1.5    su    
     4 D     3      a     
     5 A     1.8    sp    
     6 A     7      sp    
     7 A     3.2    sp    
     8 A     4      su    
     9 A     1      a     
    10 A     1.1    a     
    11 A     3      w     
    12 A     3.3    w