Search code examples
rfunctionquasiquotes

Issue adapting a function that takes as argument a column name


I have the following function that has two actions: 1) takes a column character and converts all characters to lowercase, 2) removes any special characters that may be present in the column.

clean_string<-function(data,variable){
  data <- data |> dplyr::mutate({{variable}} := tolower({{variable}}))
  x <- data |> dplyr::mutate({{variable}} := gsub("[^a-z]", "", {{variable}}))
  return (x)
}

Here you have some dummy data to test it:

var_1<-rep(c("A","B%","C v","B","A","C"),10)
var_2<-rep(c("VAron","v Aron","muJER","Muj3er"),15)
var_3<-c(rep(c("1","0"),10),rep("0",5),rep(c("0","1","0"),10),rep("1",5) )
dat<-data.frame(var_1,var_2,var_3)
D.D_clean_1<- dat |>  clean_string( variable = var_1) |> clean_string( variable = var_2)

Now I want to be able to use this function with several columns. So I tried to pass as an argument a vector with the name of several columns. First I tried to use the quasiquotation inside a loop:

clean_string<-function(data,variables){
  data.table::setDT(data)
  for (j in variables){
  data <- data |> dplyr::mutate({{j}} := tolower({{j}}))
  x <- data |> dplyr::mutate({{j}} := gsub("[^a-z]", "", {{j}}))
  }
  return (x)
}

But it does not work since I cannot produce a vector with the name of the columns without "") So, then I tried to change the quasiquotation to [[]]. This supposed me to create a vector with the name of the columns as strings. However I got the following error:

Error: unexpected '[[' in:
"  for (j in variables){
  data <- data |> dplyr::mutate([["

Why any of my approaches are not working? How should I do it.


Solution

  • Using dplyr::across and ... you could do:

    library(dplyr, warn = FALSE)
    clean_string <- function(data, ...) {
      data |>
        dplyr::mutate(
          dplyr::across(c(...), tolower),
          dplyr::across(c(...), ~ gsub("[^a-z]", "", .x))
        )
    }
    
    dat |>
      clean_string(var_1, var_2)
    #>    var_1 var_2 var_3
    #> 1      a varon     1
    #> 2      b varon     0
    #> 3     cv mujer     1
    #> 4      b mujer     0
    #> 5      a varon     1
    #> 6      c varon     0
    #> 7      a mujer     1
    #> 8      b mujer     0
    #> 9     cv varon     1
    #> 10     b varon     0