Search code examples
rdplyrcasemutate

Problem with case_when in mutate function R


I have the following data frame called result.

MANUAL.ID AUTO.ID loc
NA PYPPYP L2
PIPpip NA L1
Barbar NA L5
NA Pippip L3
NA Pippip,BerBar L3

I try to replace all the different forms of a same tag by the right one. For example replace all PIPPIP and PIPpip by Pippip or Berbar by Barbar. To do this, I use a mutate function with case_when based on a required file called tesaurus which have column with all the possible case of a same tag (tag_id) and a column with the correct one (tag_ok) which looks like this :

tag_id tag_ok
PYPPYP Pippip
PIPpip Pippip
Pippip Pippip
Barbar Barbar

I use the following code.

library("plyr")
library("dplyr")
library("reshape")
library("data.table")
library("stringr")
library("tidyr")
library("openxlsx")

tesaurus <- read.xlsx("Requested_files/sp_tesaurus.xlsx", sheet = "tesaurus") %>%
  select(-bat_sp)

result <- result %>%
  mutate_at(
    vars(contains("ID")),
    list(as.formula(
      paste0(
        "~ case_when(",
        paste(
          sprintf(
            "str_detect(., pattern = regex('%s', ignore_case = TRUE)) ~ '%s'",
            tesaurus$tag_id,
            tesaurus$tag_ok
          ),
          collapse = ", "
        ), ", TRUE ~ .)"
      )
    ))
  )

My problem is that when I run it I have this error message :

Error in `mutate()`:
ℹ In argument: `AUTO.ID = (structure(function (..., .x = ..1, .y = ..2, . = ..1) ...`.
Caused by error in `case_when()`:
! Can't combine `..1 (right)` <character> and `..161 (right)` <double>.
---
Backtrace:
     ▆
  1. ├─result %>% ...
  2. ├─dplyr::mutate_at(...)
  3. │ ├─dplyr::mutate(.tbl, !!!funs)
  4. │ └─dplyr:::mutate.data.frame(.tbl, !!!funs)
  5. │   └─dplyr:::mutate_cols(.data, dplyr_quosures(...), by)
  6. │     ├─base::withCallingHandlers(...)
  7. │     └─dplyr:::mutate_col(dots[[i]], data, mask, new_columns)
  8. │       └─mask$eval_all_mutate(quo)
  9. │         └─dplyr (local) eval()
 10. ├─`<inln_cl_>`(AUTO.ID)
 11. │ └─rlang::eval_bare(`_quo`, base::parent.frame())
 12. ├─rlang (local) case_when(...)
 13. └─dplyr::case_when(...)

So, what is the problem and how can I fix it ?


Solution

  • I think a join/coalesce method will work here. Updated with your new data (i.e., "PYPPYP" to "Pippip"):

    result %>%
      left_join(tesaurus, by = c(MANUAL.ID = "tag_id")) %>%
      mutate(MANUAL.ID = coalesce(tag_ok, MANUAL.ID)) %>%
      select(-tag_ok) %>%
      left_join(tesaurus, by = c(AUTO.ID = "tag_id")) %>%
      mutate(AUTO.ID = coalesce(tag_ok, AUTO.ID)) %>%
      select(-tag_ok)
    #   MANUAL.ID AUTO.ID loc
    # 1      <NA>  Pippip  L2
    # 2    Pippip    <NA>  L1
    # 3    Barbar    <NA>  L5
    # 4      <NA>  Pippip  L3
    

    or we can use match (though I think it's not better than above):

    result %>%
      mutate(across(c(MANUAL.ID, AUTO.ID),
        ~ coalesce(tesaurus$tag_ok[match(., tesaurus$tag_id)], .)))
    #   MANUAL.ID AUTO.ID loc
    # 1      <NA>  Pippip  L2
    # 2    Pippip    <NA>  L1
    # 3    Barbar    <NA>  L5
    # 4      <NA>  Pippip  L3
    

    Your added example with "Pippip,BerBar" is going to do nothing since you did not include BerBar in the tesaurus lookup dictionary. However, if we add it:

    tesaurus <- bind_rows(tesaurus, data.frame(tag_id="BerBar", tag_ok="Barbar"))
    tesaurus
    #   tag_id tag_ok
    # 1 PYPPYP Pippip
    # 2 PIPpip Pippip
    # 3 Pippip Pippip
    # 4 Barbar Barbar
    # 5 BerBar Barbar
    

    we can use strsplit and the match lookup as follows:

    result %>%
      mutate(across(c(MANUAL.ID, AUTO.ID),
         ~ sapply(strsplit(., "[, ]+"), function(st) paste(coalesce(tesaurus$tag_ok[match(st, tesaurus$tag_id)], st), collapse = ","))))
    #   MANUAL.ID       AUTO.ID loc
    # 1        NA        Pippip  L2
    # 2    Pippip            NA  L1
    # 3    Barbar            NA  L5
    # 4        NA        Pippip  L3
    # 5        NA Pippip,Barbar  L3