Search code examples
rreplacemutate

How to replace values in multiple columns according to conditions in R


I am having trouble writing a code that would replace all specified values in multiple columns with new values. The data frame has 20+ columns and I only want to change values in 8 columns (col1, col2, col3, etc). I want to replace all values (4, 5, 6, 7) with (0, -1, -2, -3) respectively. I have very limited knowledge in R and progamming and I have only been able to get a solution that would do the job for one column.

I have read so many solutions to similar questions on here but I could find a solution that works for me. So here is my code:

data$col1[raw_data$col1 == 4 ] <- 0
data$col1[raw_data$col1 == 5 ] <- -1
data$col1[raw_data$col1 == 6] <- -2
data$col1[raw_data$col1 == 7] <- -3

So this works well for one column. can I possibly do it one for all columns?

here is a snippet of how the columns and values are: dataframe


Solution

  • Set up an example:

    demodf <- data.frame(
      col1 = 1:10,
      col2 = 3:12,
      col3 = 5:14,
      col4 = 7:16
    )
    
    cols_to_amend <- c("col1", "col3")
    

    replace just the relevant columns:

    demodf[cols_to_amend] <- apply(demodf[cols_to_amend], 2, FUN = \(x) sapply(x, \(y) if (y %in% 4:7) 4-y else y))
    

    gives:

       col1 col2 col3 col4
    1     1    3   -1    7
    2     2    4   -2    8
    3     3    5   -3    9
    4     0    6    8   10
    5    -1    7    9   11
    6    -2    8   10   12
    7    -3    9   11   13
    8     8   10   12   14
    9     9   11   13   15
    10   10   12   14   16
    

    Explanation:

    # we can use the list of column names to choose where we are replacing
    demodf[cols_to_amend]
    # we then use `apply` and `MARGIN = 2` to apply a function to each column in this data frame:
     <- apply(demodf[cols_to_amend], 2,
    # The function we apply will be an anonymous function (`\( )`) taking as its input one column at a time:
    FUN = \(x)
    # and it will use `sapply` to go down that column performing the following on each item:
    \(y) if (y %in% 4:7) 4-y else y)
    

    dplyr version:

    library(dplyr)
    
    demodf |> 
      mutate(
        across(all_of(cols_to_amend),
               ~ ifelse(.x %in% 4:7, 4-.x, .x)
               )
        )
    

    dplyr version 2

    Excessive complexity for this toy example, but allowing for more complex replacements than simple math:

    demodf |> 
      mutate(
        across(all_of(cols_to_amend),
               ~ case_when(.x == 4 ~ 0,
                           .x == 5 ~ -1,
                           .x == 6 ~ -2,
                           .x == 7 ~ -3,
                           .default = .x)
               )
        )