Search code examples
rdataframemaxrecode

How to recode data frame columns depending on maximum values?


I have a dataframe with dozens of columns and a few thousand rows. I would like to recode the numeric columns that have a maximum value of 8 in such a way that 8 becomes NA, and recode the numeric columns that have a maximum value of 9 in such a way that 8 becomes NA and 9 becomes 9999. For example,

mydf <- data.frame(a = c(1, 2, 8, 9), b = c(7, 8, 9, 10), c = c(4, 5, 6, 9), d = c(5, 6, 7, 8), e = c("a", "b", "c", "d"))

> mydf
  a  b c d e
1 1  7 4 5 a
2 2  8 5 6 b
3 8  9 6 7 c
4 9 10 9 8 d

would become:

> mydf
     a  b    c  d e
1    1  7    4  5 a
2    2  8    5  6 b
3   NA  9    6  7 c
4 9999 10 9999 NA d

I thought of doing this:

mydf1 <- mydf[,sapply(mydf, max) == 8]
mydf2 <- mydf[,sapply(mydf, max) == 9]
mydf1[mydf1 == 8] <- NA
mydf2[mydf2 == 8] <- NA
mydf2[mydf2 == 9] <- 9999

but I don't know how to bring the recoded variables from the new data frames back into the original data frame -- and I'm sure there are much more efficient solutions anyways.


Solution

  • You can check max value for each column using lapply and recode if it is either 8 or 9.

    mydf[] <- lapply(mydf, function(x) {
      if(max(x) %in%  c(8, 9)) {
        x[x == 8] <- NA
        x[x == 9] <- 9999
      }
      x
    })
    mydf
    
    #     a  b    c  d e
    #1    1  7    4  5 a
    #2    2  8    5  6 b
    #3   NA  9    6  7 c
    #4 9999 10 9999 NA d