I need to categorize a very long df using a long list of criteria. Here is a simplified version of the criteria as a df:
crit <- data.frame(grp = c("g1", "g1", "g1", "g2", "g2", "g2"),
class = c("A", "B", "C", "A", "B", "C"),
min = c(1, 3, 5, 8, 10, 12),
max = c(3, 5, 8, 10, 12, 14)
)
A second df would receive a column containing "class" based on whether the value is linked to "grp" (part 1 of the procedure) and falls within the specified ranges (min, max) (part 2 of the procedure). Also, if a value is below the lowest or above the highest value in a range, it will be categorized as belonging to the lowest/highest "class." For example:
grp | val | class |
---|---|---|
g1 | 0 | A |
g1 | 1 | A |
g2 | 7 | A |
g2 | 11 | B |
df <- data.frame(grp = c("g1", "g1", "g2", "g2"),
val = c(0, 1, 7, 11)
)
Do you have any suggestions on how to do this using dplyr? Any help is very much appreciated.
One option is something like this:
df %>%
left_join(crit, by = "grp", relationship = "many-to-many") %>%
filter(val >= min & val <= max) %>%
select(-min, -max)
Essentially, it peforms a kind-of crossjoin, then filters to find the ones that match the criteria.
Another option is this:
# group everything by `grp`, so we just have one row for each `grp`, and a list of classes, mins and maxes
crit <- crit %>%
mutate(class = list(class), min = list(min), max = list(max), .by = "grp") %>%
distinct()
df %>%
left_join(crit, by = "grp") %>%
mutate(class = pmap(list(val, class, min, max), ~..2[..3 <= ..1 & ..1 <= ..4])) %>% # parallel map
select(-min, -max) %>%
unnest(class)