I have a series of vendors and billing amounts, with billing amounts grouped into buckets.
I want to subset the dataset to only providers who have amounts in both ‘< 100’ buckets, as well as ’500-1000’ buckets or ‘> 1000’ buckets. Sample data:
df <- structure(list(GrossAmt = c(74.37, 69.69, 705.76, 694.12, 5243,
2680.95, 23270, 64.31, 64.31, 64.31, 1863.6, 4030.38, 43.86,
36.57, 37.29, 31.02, 59.43, 27.65), VenName = structure(c(3L,
3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 2L, 2L, 2L, 1L, 1L,
1L), .Label = c("Labcorp", "Quest Diagnostics Incorporated",
"THOMAS JEFFERSON UNIV HOSPITAL", "WASHINGTON HOSPITAL CENTER"
), class = "factor"), AmtGrp = structure(c(1L, 1L, 3L, 3L, 2L,
2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("<= 100",
"> 1000", "500 - 1000"), class = "factor")), .Names = c("GrossAmt",
"VenName", "AmtGrp"), class = "data.frame", row.names = c(NA,
-18L))
In my example, the resulting dataset would have all records from TJU Hospital & Washington Hospital Center, since they have bills both < $100 & in one of the higher buckets. The other providers would be filtered out b/c they don't have bills > $500.
I would provide the work that I've done so far, but honestly not sure where to begin on this one so forgive me. My first instinct is that I need to set up a grep command for records based on the grouping criteria, but i don't know how to match based on the vendor's name.
Edit - Expanded Question:
How can I filter by any Vendor where they fall into more than one amt group, irrespective of what the specific amount group is?
library(dplyr)
chain(df, group_by(VenName),
filter(any(AmtGrp == '<= 100'),
!all(AmtGrp == '<= 100')))
Edit: second question
chain(df, group_by(VenName),
filter(length(unique(AmtGrp)) > 1))