Search code examples
rdataframedplyrgrouping

Drop unit if in at least one column its observations are all missing


I have a dataframe like the one below, with units ('teams') observed over time. I want to identify and drop all teams where in at least one column all observations are NA. In this example, only team "A" would not be dropped.

df <- data.frame(team=c("A", "A", "B", "B", "C", "C"),
                 year=c(1992, 1993, 1992, 1993, 1992, 1993),
                 points=c(NA, 15, 17, 24, NA, NA),
                 assists=c(4, 7, NA, NA, 12, NA))

  team year points assists
1    A 1992     NA       4
2    A 1993     15       7
3    B 1992     17      NA
4    B 1993     24      NA
5    C 1992     NA      12
6    C 1993     NA      NA

I have a slight preference for dplyr, but any solution is welcome. I can think of doing group_by() and filter() but don't know how to do the "at least one column" part. Thanks a lot for your help!


Solution

  • You could use if_any/if_all:

    library(dplyr)
    
    df %>%
      group_by(team) %>%
      filter(!if_any(everything(), ~ all(is.na(.x)))) %>%
      ungroup()
    

    or

    df %>%
      group_by(team) %>%
      filter(if_all(everything(), ~ !all(is.na(.x)))) %>%
      ungroup()
    
    # # A tibble: 2 × 4
    #   team   year points assists
    #   <chr> <dbl>  <dbl>   <dbl>
    # 1 A      1992     NA       4
    # 2 A      1993     15       7
    

    everything() can be skipped (but the comma remains) because it's the default of if_any/if_all. I.e.

    filter(if_all(, ~ !all(is.na(.x))))