Search code examples
rdataframeunique

Assign specific values to unique values in all dataframe columns in R


I've got a dataframe with many columns, and each column has 3 possible values. Additionally, these 3 unique values are not the same for every column and some contain NA. Like so:

df = data.frame(
  "a" = c(13, 33, 11, 33),
  "b" = c(11, 11, 14, 11),
  "c" = c(44, 22, NA, 24)
)
       a  b  c
    1 13 11 44
    2 33 11 22
    3 11 14 NA
    4 33 11 24

Each unique value (per column) should be labeled as 0, 1, or 2: "1" for having both numbers, and "0" or "2" for having two of the same number. And NAs should be kept. Like this:

   a  b  c
1  1  0  0
2  2  0  2
3  0  1  NA
4  2  0  1

The number which is assigned "0" or "2" is not important, provided that it is uniform for the entire column.


Solution

  • sapply(df, \(x) 1+(x%%11==0) - 2*(x==min(x[x%%11==0], na.rm=TRUE)))
         a b  c
    [1,] 1 0  2
    [2,] 2 0  0
    [3,] 0 1 NA
    [4,] 2 0  1
    

    If the unique values are always XX, XY, and YY (but never YX) where X<Y, then we can simplify the above to:

    sapply(df, \(x) 1+(x%%11==0) - 2*(x==min(x, na.rm=TRUE)))