Search code examples
rif-statementdplyrcoalescecase-when

Conditional Mutate Based on Results of Coalesce in dplyr


I've been teaching myself R from scratch, basically by undertaking something, then reading posts like these, and trial-and-error based on that. Sometimes I hit a wall and reach out.

I've hit a wall. I have dplyr 0.7 installed. I have a tibble with a column - call it contract_key - I added by applying mutate(coalesce()) to three other columns in the tibble. Here's sample data:

product <- c("655393265191","655393265191","168145850127","168145850127","350468621217","350468621217","977939797847","NA","928893912852")
supplier <- c("person5","person3","person10","person5","person11","person5","person11","person14","person5")
vendor <- c("org2","org3","org3","org2","org1","org2","org1","org5","org2")
quantity <- c(7,5,6,1,2,1,18,2,2)
gross <- c(0.0419,0.0193,0.0439,0.0069,0.0027,0.0055,0.0233,NA,0.0004)

df <- data_frame(product,supplier,vendor,quantity,gross)

Here's how I generated contract_key:

df <- df %>% 
  mutate(contract_key = coalesce(product,supplier,vendor))

I now want to add another column that categorizes the contents of contract_key based on which of the three columns provided the content (through coalesce()). So if contract_key ="person5", for example, the new column, contract_level, would be "supplier". And contract_key="org2" would map to contract_level = "vendor", etc.

Essentially, I'll be using contract_level as a join variable to another tibble.

I'm stumped. I've tried if_else, and I see that I shouldn't bother trying case_when (because it's inside mutate()). I've also tried nesting if_else's to no avail.

It's probably basic R syntax that I just don't know. Something to do with dot notation and grammar. If someone supplies the answer, I will backtrace until I figure out what you did. (And I'll have learned a new lesson in R!)

Thanks!


Solution

  • How about this:

    df %>% mutate(contract_key = coalesce(product,supplier,vendor),
                  contract_level = case_when(contract_key %in% product ~ "product",
                                             contract_key %in% supplier ~ "supplier",
                                             contract_key %in% vendor ~ "vendor",
                                             TRUE ~ "none"))
    
           product supplier vendor quantity  gross contract_key contract_level
    1 655393265191  person5   org2        7 0.0419 655393265191        product
    2 655393265191  person3   org3        5 0.0193 655393265191        product
    3 168145850127 person10   org3        6 0.0439 168145850127        product
    4 168145850127  person5   org2        1 0.0069 168145850127        product
    5 350468621217 person11   org1        2 0.0027 350468621217        product
    6 350468621217  person5   org2        1 0.0055 350468621217        product
    7 977939797847 person11   org1       18 0.0233 977939797847        product
    8         <NA> person14   org5        2     NA     person14       supplier
    9 928893912852  person5   org2        2 0.0004 928893912852        product
    

    Other options that require less code:

    df %>% mutate(contract_key = coalesce(product,supplier,vendor),
                  contract_level = if_else(!is.na(product), 'product', 
                                           if_else(!is.na(supplier), 'supplier', 'vendor')))
    
    df %>% mutate(contract_key = coalesce(product,supplier,vendor),
                  contract_level = apply(., 1, function(x) names(.)[min(which(!is.na(x)))]))