Search code examples
rdplyrtidyverts

I want to filter group id's specific conditions meeting on both column and some row value in r


I have sample data and I want to filter the number of id's never had sup status while on type ==N, meaning I only choose id with status == unsup while before switching type, and then the number id's who switched from N to P.

  • for example id==1 never had status==sup while on type==N, So I need to count id 1. Then I want to check this id also whether switching to P. But id 2 is not eligible for selected because it have sup status while on type==N.

  • id's 2, 5, and id 7 will not be eligible for as they had status == sup, while on status N and id 7 was on NA only while on N.

data <- data.frame(id=c(1,1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4,5,5,5,5,6,6,6,6,6,7,7,7),
                   type=c('N','N','N', 'N', 'P','P','N','N','N', 'I', 'I','N','N','N',
 'N', 'N','N','N','N', 'O', 'O','N','N','N', 'O','N','N','N', 'P', 'P', 'N','N','P'), 
status=c(NA,'unsup',NA,'unsup',NA,'sup',NA,NA,'sup',NA,'sup','unsup',NA,'unsup',NA,
'unsup','unsup',NA,'unsup',NA,'sup','sup',NA,NA,'unsup',NA,'unsup','unsup','unsup','sup', NA, NA, 'sup'))

Expected output

1.

   id type status
1   1    N   <NA>
2   1    N  unsup
3   1    N   <NA>
4   1    N  unsup
5   1    P   <NA>
6   1    P    sup
7   3    N  unsup
8   3    N   <NA>
9   3    N  unsup
10  3    N   <NA>
11  3    N  unsup
12  4    N  unsup
13  4    N   <NA>
14  4    N  unsup
15  4    O   <NA>
16  4    O    sup
17  6    N   <NA>
18  6    N  unsup
19  6    N  unsup
20  6    P  unsup
21  6    P    sup

Then of which, id's switched to P are:

   id type status
1   1    N   <NA>
2   1    N  unsup
3   1    N   <NA>
4   1    N  unsup
5   1    P   <NA>
6   1    P    sup
7   6    N   <NA>
8   6    N  unsup
9   6    N  unsup
10  6    P  unsup
11  6    P    sup

Solution

  • For the first case, after grouping by 'id', filter any 'id's not having status value as 'sup' and type as 'N' and those ids having any non-NA value for status where type is 'N'

    library(dplyr)
    data1 <- data %>% 
      group_by(id) %>%
      filter((!any((status %in% 'sup' & type == 'N'), na.rm = TRUE))& 
          any(!is.na(status[type == "N"]))) %>% 
      ungroup
    

    -output

    data1
    # A tibble: 21 × 3
          id type  status
       <dbl> <chr> <chr> 
     1     1 N     <NA>  
     2     1 N     unsup 
     3     1 N     <NA>  
     4     1 N     unsup 
     5     1 P     <NA>  
     6     1 P     sup   
     7     3 N     unsup 
     8     3 N     <NA>  
     9     3 N     unsup 
    10     3 N     <NA>  
    # … with 11 more rows
    

    From the subset data, we can filter again after checking for any case where there is a type value of 'N' and the next value (lead) is 'P' for each 'id'

    data1 %>% 
      group_by(id) %>%
      filter(any(type== "N" & lead(type) == "P", na.rm = TRUE)) %>% 
      ungroup
    # A tibble: 11 × 3
          id type  status
       <dbl> <chr> <chr> 
     1     1 N     <NA>  
     2     1 N     unsup 
     3     1 N     <NA>  
     4     1 N     unsup 
     5     1 P     <NA>  
     6     1 P     sup   
     7     6 N     <NA>  
     8     6 N     unsup 
     9     6 N     unsup 
    10     6 P     unsup 
    11     6 P     sup