Search code examples
rdata-structuresintegercoercion

Converting a factor to numeric after relabeling


This question is related to Convert factor to integer and How to convert a factor to an integer\numeric without a loss of information but has a slightly different problem with type coercion.

The two former question seem to deal with cases were a factor is explicitly constructed from a previously existing vector of class numeric or of class integer without relabeling the levels. In these cases:

f <- factor(c("1","2","1","2"))
as.numeric(levels(f))[f]

returns

# [1] 1 2 1 2

but when I relabel the levels:

f <- factor(c("1","2","1","2"))
f <- factor(f,
            levels = c(1, 2),
            labels = c("a", "b"))
as.numeric(levels(f))[f]

I will get

# [1] NA NA NA NA
# Warning message:
# NAs introduced by coercion

whereas

as.numeric(f)

returns

# [1] 1 2 1 2

What is the right procedure in such a case to get the original values back? Is it just as.numeric(f)?

In case it's relevant:

> sessionInfo()
R version 3.1.2 RC (2014-10-28 r66890)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_IE.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_IE.UTF-8        LC_COLLATE=en_IE.UTF-8
 [5] LC_MONETARY=en_IE.UTF-8    LC_MESSAGES=en_IE.UTF-8
 [7] LC_PAPER=en_IE.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_IE.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base
loaded via a namespace (and not attached):
[1] tools_3.1.2

Solution

  • If you know for a certainty that there is an exact correspondence between the original levels and the underlying factor/integer encoding, then you can use as.numeric(f). But ... if the original vector were

     f <- factor(c("2","3","2","3"))
    

    And you changed the level-labels to alpha values, then as.numeric(f) would give misleading results. The factor encoding always starts with 1L.