I'm struggling to find the connection between numeric (integer) values that exist in a R factor object and its level labels. I know how to define the levels and the labels. But let's assume I get an unfamiliar data set in which I'll find several factors (here: sex & color):
test <- data.frame(
factor(c(1,2,1,1,2,2,1),
levels= c(1,2),
labels = c("female", "male")
),
factor(c(3,2,2,1,4,4,5),
levels= c(1,2,3,4,5),
labels= c("red", "green", "blue", "yellow", "brown")
)
)
names(test) <- c("sex", "color")
test
sex color
1 female blue
2 male green
3 female green
4 female red
5 male yellow
6 male yellow
7 female brown
I will be able to obtain the level labels by using attributes()
and I will be able to obtain the numeric values e.g. by using test$sex <- as.numeric(test$sex)
But how do I know, that 1 equals female and 2 equals male? Same thing (even worse) for the colors. How do I establish the connection?
Thanks
As others have said, the integer value simply increments along the length of the levels. Personally, I find this easiest to visualize in a reference table.
test <- data.frame(
sex = factor(c(1,2,1,1,2,2,1),
levels= c(1,2),
labels = c("female", "male")
),
color = factor(c(3,2,2,1,4,4,5),
levels= c(1,2,3,4,5),
labels= c("red", "green", "blue", "yellow", "brown")
)
)
# Make a reference table
data.frame(level = seq_along(levels(test$color)),
label = levels(test$color))
level label
1 1 red
2 2 green
3 3 blue
4 4 yellow
5 5 brown
If you want to get the references for all of the factors in a data frame, you can vectorize the code:
factor_reference <- function(data)
{
Ref <-
lapply(data,
function(x)
{
if (is.factor(x)) data.frame(level = seq_along(levels(x)),
label = levels(x))
else NULL
}
)
Ref[!vapply(Ref, is.null, logical(1))]
}
factor_reference(test)
$sex
level label
1 1 female
2 2 male
$color
level label
1 1 red
2 2 green
3 3 blue
4 4 yellow
5 5 brown