I want to write a function that is doing the same as the SPSS command AUTORECODE.
AUTORECODE recodes the values of string and numeric variables to consecutive integers and puts the recoded values into a new variable called a target variable.
At first I tried this way:
AUTORECODE <- function(variable = NULL){
A <- sort(unique(variable))
B <- seq(1:length(unique(variable)))
REC <- Recode(var = variable, recodes = "A = B")
return(REC)
}
But this causes an error. I think the problem is caused by the committal of A and B to the recodes argument. Thats why I tried
eval(parse(text = paste("REC <- Recode(var = variable, recodes = 'c(",A,") = c(",B,")')")))
within the function. But this isn´t the right solution.
Ideas?
factor
may be simply what you need, as James suggested in a comment, it's storing them as integers behind the scenes (as seen by str
) and just outputting the corresponding labels. This may also be very useful as R has lots of commands for working with factors appropriately, such as when fitting linear models, it makes all the "dummy" variables for you.
> x <- LETTERS[c(4,2,3,1,3)]
> f <- factor(x)
> f
[1] D B C A C
Levels: A B C D
> str(f)
Factor w/ 4 levels "A","B","C","D": 4 2 3 1 3
If you do just need the numbers, use as.integer
on the factor.
> n <- as.integer(f)
> n
[1] 4 2 3 1 3
An alternate solution is to use match
, but if you're starting with floating-point numbers, watch out for floating-point traps. factor
converts everything to characters first, which effectively rounds floating-point numbers to a certain number of digits, making floating-point traps less of a concern.
> match(x, sort(unique(x)))
[1] 4 2 3 1 3