Search code examples
rdplyr

Remove specific rows in case the variables specified are all NA using dplyr


Given the following data ...

x <- data.frame("Y" = 2000:2010,
                "A" = c(0, NA, 1, 1, 0, 1, NA, NA, 1, 0, NA),
                "B" = c(0, 0, NA, 1, 1, 0, NA, NA, 0, 1, NA)) 

... I was able to remove all rows containing NA values only from specific columns following this wonderful answer.

x |> dplyr::filter(if_any(c("A", "B"), ~ !is.na(.x)))
#>      Y  A  B
#> 1 2000  0  0
#> 2 2001 NA  0
#> 3 2002  1 NA
#> 4 2003  1  1
#> 5 2004  0  1
#> 6 2005  1  0
#> 7 2008  1  0
#> 8 2009  0  1

As I don't have much experience with dplyr yet, I can't figure out how this expression needs to be modified if I wanted filter() to be applied only on specific rows, e.g. the last one.

The expected result would look like this, dropping Y = 2010 and keeping Y = 2006 and Y = 2007:

#>       Y  A  B
#> 1  2000  0  0
#> 2  2001 NA  0
#> 3  2002  1 NA
#> 4  2003  1  1
#> 5  2004  0  1
#> 6  2005  1  0
#> 7  2006 NA NA
#> 8  2007 NA NA
#> 9  2008  1  0
#> 10 2009  0  1

Solution

  • In the specific case of focussing on the last row one can use row_number in combination with n.

    See ?cur_group for more on these variables

    library(dplyr)
    
    x %>% 
      filter(!(row_number() == n() & is.na(if_any(A:B))))