Search code examples
rtidyverserecodeacross

tidyverse solution for recoding several columnsinto new ones where column name suffixes increase by one


This is only pseudo code, but let's assume I have 22 columns named from Q46x47_1 to Q46x47_22.

I now want to apply the following recode logic:

if Q46x47_1=0 Q46x47_2=0. 
if Q46x47_3=0 Q46x47_4=0. 
if Q46x47_5=0 Q46x47_6=0. 
if Q46x47_7=0 Q46x47_8=0. 
if Q46x47_9=0 Q46x47_10=0. 
if Q46x47_11=0 Q46x47_12=0. 
if Q46x47_13=0 Q46x47_14=0. 
if Q46x47_15=0 Q46x47_16=0. 
if Q46x47_17=0 Q46x47_18=0. 
if Q46x47_19=0 Q46x47_20=0. 
if Q46x47_21=0 Q46x47_22=0.

Of course, I don't want to type this by hand. So I was wondering if there's an elegant way to bulk recode this. I know of across, but I don't see a way how to pass each second variable with the increased suffix number.

So I started with:

df %>%
  mutate(across(num_range(prefix = "Q46x47_", range = seq(1, 21, 2)), ~if_else(. == 0, ..., .)))

My problem is (given if_else could be used here) how to specify the ... part in the if_else command where I need to pass the LHS column names.

Any ideas?


Short reproducible example with a subset of columns:

df <- data.frame(Q46x47_1 = c(1, 2, 0, 1, 0),
                 Q46x47_2 = c(1, 2, 3, 1, 0),
                 Q46x47_3 = c(1, 2, 0, 1, 0),
                 Q46x47_4 = c(1, 2, 3, 1, 0),
                 Q46x47_5 = c(1, 2, 0, 1, 0),
                 Q46x47_6 = c(1, 2, 3, 1, 0))

Solution

  • In base R, this is more easier

    nm1 <- paste0("Q46x47_", seq(1, 21, by = 2))
    nm2 <- paste0("Q46x47_", seq(2, 22, by = 2))
    i1 <- df[nm1] == 0
    df[nm2][i1] <- 0
    

    For the example provided

    nm1 <- paste0("Q46x47_", seq(1, 5, by = 2))
    nm2 <- paste0("Q46x47_", seq(2, 6, by = 2))
    

    If we need a dplyr option, an opiton is to get the column name when we loop across the even columns specified in 'nm2', then extract the suffix digits at the end, convert to numeric, subtract 1 to create the odd column name, get the value of the column and do the replacement

    library(dplyr)
    library(stringr)
    df %>%
       mutate(across(all_of(nm2), ~
                    {
                 tmp <- cur_column()
                 replace(., get(str_c('Q46x47_', 
                as.integer(str_extract(tmp, "\\d+$"))-1)) == 0, 0)
    
               }))
    

    -output

    # Q46x47_1 Q46x47_2 Q46x47_3 Q46x47_4 Q46x47_5 Q46x47_6
    #1        1        1        1        1        1        1
    #2        2        2        2        2        2        2
    #3        0        0        0        0        0        0
    #4        1        1        1        1        1        1
    #5        0        0        0        0        0        0