Search code examples
rdplyrfilterfilteringsubset

Filtering table based on the latest row condition


I have a table like the following:

date  user  X1 X2 X3
1/1     1    0  3 34 
2/1     1    0  7 65
3/1     1    0  0  0
4/1     1   25  4 65
1/1     2  285  0  0
2/1     2    0  0  0
3/1     2    0 54  0
4/1     2    0  0  0

How can I use dplyr to select the users that have non-zero data only at the last available date (4/1) for all Xs ?? In this case user 2 should be removed. Thanks


Solution

  • Another option using any with c_across to check if the values are 0 and the last row_number like this:

    library(dplyr)
    df %>%
      group_by(user) %>%
      filter(any(c_across(starts_with("X")) != 0 & row_number() == n()))
    #> # A tibble: 4 × 5
    #> # Groups:   user [1]
    #>   date   user    X1    X2    X3
    #>   <chr> <int> <int> <int> <int>
    #> 1 1/1       1     0     3    34
    #> 2 2/1       1     0     7    65
    #> 3 3/1       1     0     0     0
    #> 4 4/1       1    25     4    65
    

    Created on 2023-03-15 with reprex v2.0.2