Search code examples
rdplyrfilterconditional-statementssubset

I am trying to filter on two conditions, but I keep removing all patients with either condition


I'm a beginner on R so apologies for errors, and thank you for helping.

I have a dataset (liver) where rows are patient ID numbers, and columns include what region the patient resides in (London, Yorkshire etc) and what unit the patient was treated in (hospital name). Some of the units are private units. I've identified 120 patients from London, of whom 100 were treated across three private units. I want to remove the 100 London patients treated in private units but I keep accidentally removing all patients treated in the private units (around 900 patients). I'd be grateful for advice on how to just remove the London patients treated privately.

I've tried various combinations of using subset and filter with different exclamation points and brackets in different places including for example:

liver <- filter(liver, region_name != "London" & unit_name!="Primrose Hospital" & unit_name != "Oak Hospital" & unit_name != "Wilson Hospital")

Thank you very much.


Solution

  • Building on Pariksheet's great start (still drops outside-London private hospital patients). Here we need to use the OR | operator within the filter function. I've made an example dataframe which demonstrates how this works for your case. The example tibble contains your three private London hospitals plus one non-private hospital that we want to keep. Plus, it has Manchester patients who attend both Manch and one of the private hospitals, all of whom we want to keep.

    EDITED: Now includes character vectors to allow generalisation of combinations to exclude.

    liver <- tibble(region_name = rep(c('London', 'Liverpool', 'Glasgow', 'Manchester'), each = 4),
                    unit_name = c(rep(c('Primrose Hospital',
                                  'Oak Hospital',
                                  'Wilson Hospital',
                                  'State Hospital'), times = 3), 
                                  rep(c('Manch General', 'Primrose Hospital'), each = 2)))
    
    liver
    
    # A tibble: 16 x 2
       region_name unit_name        
       <chr>       <chr>            
     1 London      Primrose Hospital
     2 London      Oak Hospital     
     3 London      Wilson Hospital  
     4 London      State Hospital   
     5 Liverpool   Primrose Hospital
     6 Liverpool   Oak Hospital     
     7 Liverpool   Wilson Hospital  
     8 Liverpool   State Hospital   
     9 Glasgow     Primrose Hospital
    10 Glasgow     Oak Hospital     
    11 Glasgow     Wilson Hospital  
    12 Glasgow     State Hospital   
    13 Manchester  Manch General    
    14 Manchester  Manch General    
    15 Manchester  Primrose Hospital
    16 Manchester  Primrose Hospital
    
    excl.private.regions <- c('London', 
                              'Liverpool', 
                              'Glasgow')
    excl.private.hospitals <- c('Primrose Hospital',
                                'Oak Hospital',
                                'Wilson Hospital')
    
    liver %>% 
      filter(! region_name %in% excl.private.regions |
               ! unit_name %in% excl.private.hospitals)
    
    # A tibble: 7 x 2
      region_name unit_name        
      <chr>       <chr>            
    1 London      State Hospital   
    2 Liverpool   State Hospital   
    3 Glasgow     State Hospital   
    4 Manchester  Manch General    
    5 Manchester  Manch General    
    6 Manchester  Primrose Hospital
    7 Manchester  Primrose Hospital