Search code examples
rstringstringrtidyselect

Remove string in multiple variables in R


Here is a MWE of my data from which I want to remove the string "NaN" from all the columns which contain "Med"

df= data.frame(id= rep(1:5, each=1),
               Med1 = c("GN", "GN", "Ca", "Ca", "DM"),
               Med2 = c("DM", "NaN", "Mob", "NaN", "NaN"),
               Med3 = c("NaN","NaN","DM", "NaN","NaN"))

I have tried the following:

dfx = df%>%
  mutate(across(contains("Med", ignore.case = TRUE), str_remove(.,"NaN")))
Error: Problem with `mutate()` input `..1`.
x Problem with `across()` input `.fns`.
i Input `.fns` must be NULL, a function, a formula, or a list of functions/formulas.
i Input `..1` is `(function (.cols = everything(), .fns = NULL, ..., .names = NULL) ...`.
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning message:
Problem with `mutate()` input `..1`.
i argument is not an atomic vector; coercing
i Input `..1` is `(function (.cols = everything(), .fns = NULL, ..., .names = NULL) ...`. 
dfx = df%>%
  mutate(across(contains("Med", ignore.case = TRUE), str_remove("NaN")))
Error: Problem with `mutate()` input `..1`.
x argument "pattern" is missing, with no default
i Input `..1` is `(function (.cols = everything(), .fns = NULL, ..., .names = NULL) ...`.

I also have a problem removing the string just from a single column, so I think I may be misunderstanding str_remove

dfy=df%>%
 str_remove(string = Med1, pattern = "NaN")
Error in str_remove(., string = Med1, pattern = "NaN") : 
  unused argument (.)

Solution

  • Up front: add a tilde to your code:

    dfx = df%>%                                        # ,--- add this tilde
      mutate(across(contains("Med", ignore.case = TRUE), ~ str_remove(.,"NaN")))
    

    Explanation: across takes as its second argument a function. This can be expressed in a few ways:

    1. Raw function, such as across(everything(), mean). You can add arbitrary named/unnamed arguments afterward, though they are separate from the data itself.

      mtcars %>%
        mutate(across(everything(), mean))
      mtcars %>%
        mutate(across(everything(), mean, na.rm = TRUE))
      

      (This does not assume base-R functions: you can create your own function elsewhere and pass it here.)

    2. Anonymous functions, which allow more flexibility with the call. Perhaps:

      mtcars %>%
        mutate(across(everything(), function(z) mean(x)))
      mtcars %>%
        mutate(across(everything(), function(z) mean(x, na.rm = TRUE)))
      
    3. rlang-style tilde functions. In these, a . is replaced by the vector of data (for each column being mutated):

      mtcars %>%
        mutate(across(everything(), ~ mean(.)))
      mtcars %>%
        mutate(across(everything(), ~ mean(., na.rm = TRUE)))
      

    Of course, you don't need stringr to do this task.

    df
    #   id Med1 Med2 Med3
    # 1  1   GN   DM  NaN
    # 2  2   GN  NaN  NaN
    # 3  3   Ca  Mob   DM
    # 4  4   Ca  NaN  NaN
    # 5  5   DM  NaN  NaN
    df[df == "NaN"] <- ""
    df
    #   id Med1 Med2 Med3
    # 1  1   GN   DM     
    # 2  2   GN          
    # 3  3   Ca  Mob   DM
    # 4  4   Ca          
    # 5  5   DM