Search code examples
rcriteria

grep records on multiple criteria


I have a series of vendors and billing amounts, with billing amounts grouped into buckets.

I want to subset the dataset to only providers who have amounts in both ‘< 100’ buckets, as well as ’500-1000’ buckets or ‘> 1000’ buckets. Sample data:

df <- structure(list(GrossAmt = c(74.37, 69.69, 705.76, 694.12, 5243, 
2680.95, 23270, 64.31, 64.31, 64.31, 1863.6, 4030.38, 43.86, 
36.57, 37.29, 31.02, 59.43, 27.65), VenName = structure(c(3L, 
3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 2L, 2L, 2L, 1L, 1L, 
1L), .Label = c("Labcorp", "Quest Diagnostics Incorporated", 
"THOMAS JEFFERSON UNIV HOSPITAL", "WASHINGTON HOSPITAL CENTER"
), class = "factor"), AmtGrp = structure(c(1L, 1L, 3L, 3L, 2L, 
2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("<= 100", 
"> 1000", "500 - 1000"), class = "factor")), .Names = c("GrossAmt", 
"VenName", "AmtGrp"), class = "data.frame", row.names = c(NA, 
-18L))

In my example, the resulting dataset would have all records from TJU Hospital & Washington Hospital Center, since they have bills both < $100 & in one of the higher buckets. The other providers would be filtered out b/c they don't have bills > $500.

I would provide the work that I've done so far, but honestly not sure where to begin on this one so forgive me. My first instinct is that I need to set up a grep command for records based on the grouping criteria, but i don't know how to match based on the vendor's name.

Edit - Expanded Question:

How can I filter by any Vendor where they fall into more than one amt group, irrespective of what the specific amount group is?


Solution

  • library(dplyr)
    
    chain(df, group_by(VenName), 
              filter(any(AmtGrp == '<= 100'),
                     !all(AmtGrp == '<= 100')))
    

    Edit: second question

    chain(df, group_by(VenName), 
              filter(length(unique(AmtGrp)) > 1))