Search code examples
rdataset

How to call column names from an object in dplyr?


I am trying to replace all zeros in multiple columns with NA using dplyr. However, since I have many variables, I do not want to call them all by one, but rather store them in an object that I can call afterwards.

This is a minimal example of what I did:

library(dplyr)

Data <- data.frame(var1 = c(1:10), var2 = rep(c(0, 4), 5), var3 = rep(c(2, 0, 3, 4, 5), 2), var4 = rep(c(7, 0), 5))

col <- Data[,c(2:4)]

Data <- Data %>%
  mutate(across(col, na_if, 0))

However, if I do this, I get the following error message:

Error: Problem with 'mutate()' input '..1'.
x Must subset columns with a valid subscript vector.
x Subscript has the wrong type 'data.frame<

  var2: double 

  var3: double

  var4: double>'.

i It must be numeric or character.

i Input '..1' is '(function (.cols = everything(), .fns = NULL, ..., .names = NULL) ...'.

I have tried to change the format of col to a tibble, but that did not help.

Could anyone tell me how to make this work?


Solution

  • Here, the col should be names of the Data. As there is a function name with col, we can name the object differently, wrap with all_of and replace the 0 to NA within across

    library(dplyr)
    col1 <- names(Data)[2:4]
    Data <- Data %>%
       mutate(across(all_of(col1) , na_if, 0))
    

    -output

    Data
    #   var1 var2 var3 var4
    #1     1   NA    2    7
    #2     2    4   NA   NA
    #3     3   NA    3    7
    #4     4    4    4   NA
    #5     5   NA    5    7
    #6     6    4    2   NA
    #7     7   NA   NA    7
    #8     8    4    3   NA
    #9     9   NA    4    7
    #10   10    4    5   NA
    

    NOTE: Here the OP asked about looping based on either the index or the column names