I am trying to coerce numeric columns in a data frame to factors. The coercion works OK except that instead of the labels I specify I get a numeric label for each row of the data frame. There are no error messages.
I've tried tidyverse and base approaches; coerced the target vector to character (and even to integer) before coercing to factor; run the same code over a tibble rather than a data frame just in case it was to do with the row names. And I have searched here and other R-related parts of the internet.
I feel sure I am missing something obvious here, but as happens when one looks at a problem for too long, I just can't see it.
df <- data.frame("a" = c(1, 2, 2), "b" = c(2, 1, 1), row.names = NULL, stringsAsFactors = FALSE)
df$a <- factor(df$a, levels = c("1", "2"), labels = c("yes", "no"))
# coercion to factor worked:
class(df$a)
#> [1] "factor"
typeof(df$a)
#> [1] "integer"
levels(df$a)
#> [1] "yes" "no"
labels(df$a) # same as no. rows in df. Add rows and more labels appear.
#> [1] "1" "2" "3"
df$a
#> [1] yes no no
#> Levels: yes no
Created on 2020-09-24 by the reprex package (v0.3.0)
We can look at the structure of df$a
using dput
:
dput(df$a)
#> structure(c(1L, 2L, 2L), .Label = c("yes", "no"), class = "factor")
You can see that it is indeed a factor with the appropriate labels. The function labels
that you are using does not return the .Label
element of a factor. It is completely unrelated to factors, and I think you are just confused by the name. The labels
function simply gives a character vector of numbers the same length as the input vector, whatever the class is. For example:
labels(5:10)
#> [1] "1" "2" "3" "4" "5" "6"
So there is nothing wrong with your newly created factor. The levels
function rather confusingly returns the .Label
component of the factor.
Factors do not actually have a named component called "levels". The levels
parameter in the function factor
is only used sometimes when creating a factor from a character or numeric vector so that we specify which elements of our vector we are interested in. In your case, the levels
argument of the factor
call is completely redundant:
df <- data.frame("a" = c(1, 2, 2), "b" = c(2, 1, 1), row.names = NULL)
factor(df$a, labels = c("yes", "no"))
#> [1] yes no no
#> Levels: yes no
We would only use it if we wanted to drop some levels:
factor(df$a, levels = "2", labels = "no")
#> [1] <NA> no no
#> Levels: no
I think you may have been looking for:
as.numeric(df$a)
#> [1] 1 2 2
to get the original numbers back.
However, there is no error. Your coercion is correct and works exactly as intended. It is only your understanding of what the labels
function is supposed to do that is causing a problem.