Search code examples
rpurrrdplyrnames

change values in dataframe according to column name with dplyr?


My problem seemed really easy but can't figure out an easy solution. I have a value for all categorical variables in my dataset which is "missing". In order join results later on with a function of myown I need this value to be unique so what I want, is to change the value "missing" by "missing (var_name)".

I first tried something like :

data %>% mutate(across(where(is.character),
                       ~ replace(., . == "missing", paste("missing", SOMETHING(.)))))

This doesn't quite work since I miss this SOMETHING function to access the column name throughout the across statement just using the "." parameter...

The other solution I tried is using

purrr:imap(data %>% select(where(is.character)),
           ~ replace(.x, .x == "missing", paste("missing", .y))))

This is close to what I want but then I have trouble reinserting easily and computationnaly effeciently the purrr:imap output into my initial dataframe instead of the initial character columns.

I think I need some break and/or some help to see clearer because I am kind of tired fighting with something which appear to be so simple...

I would rather use the dplyrsolution but the purrr one is ok. Actually, whatever works fine and quick (just so you know, I have more than 600 cols et 150,000 rows)

Any help or advice is welcome !

Thanks


Solution

  • Example Data

    df <- data.frame(var.X = c("a", "missing", "a"),
                     var.Y = c("b", "b", "missing"),
                     var.Z = c("missing", "missing", "c"))
    
    #     var.X   var.Y   var.Z
    # 1       a       b missing
    # 2 missing       b missing
    # 3       a missing       c
    

    By dplyr, you can use cur_column() in across(). From ?context:

    cur_column() gives the name of the current column (in across() only).

    library(dplyr)
    
    df %>%
      mutate(across(where(is.character),
                    ~ recode(.x, missing = paste0("missing(", cur_column(), ")"))))
    
    #            var.X          var.Y          var.Z
    # 1              a              b missing(var.Z)
    # 2 missing(var.X)              b missing(var.Z)
    # 3              a missing(var.Y)              c
    

    or

    df %>%
      mutate(across(where(is.character),
                    ~ recode(.x, missing = sprintf("missing(%s)", cur_column()))))