I've got a dataframe with many columns, and each column has 3 possible values. Additionally, these 3 unique values are not the same for every column and some contain NA
. Like so:
df = data.frame(
"a" = c(13, 33, 11, 33),
"b" = c(11, 11, 14, 11),
"c" = c(44, 22, NA, 24)
)
a b c
1 13 11 44
2 33 11 22
3 11 14 NA
4 33 11 24
Each unique value (per column) should be labeled as 0, 1, or 2: "1" for having both numbers, and "0" or "2" for having two of the same number. And NAs should be kept. Like this:
a b c
1 1 0 0
2 2 0 2
3 0 1 NA
4 2 0 1
The number which is assigned "0" or "2" is not important, provided that it is uniform for the entire column.
sapply(df, \(x) 1+(x%%11==0) - 2*(x==min(x[x%%11==0], na.rm=TRUE)))
a b c
[1,] 1 0 2
[2,] 2 0 0
[3,] 0 1 NA
[4,] 2 0 1
If the unique values are always XX, XY, and YY (but never YX) where X<Y, then we can simplify the above to:
sapply(df, \(x) 1+(x%%11==0) - 2*(x==min(x, na.rm=TRUE)))