Beforehand
Most obvious answer to the title is that missings are represented with NA
in R. Dummy data:
x <- c("a", "NA", "<NA>", NA)
We can transform all elements of x
to characters using x_paste0 <- paste0(x)
. After doing so, the second and fourth elements are same ("NA"
) and to my knowledge this is why there is no way to backtransform x_paste0
to x
.
addNA
But working with addNA
indicates that it is not just the NA
itself that represents missings. In x
only the last element is a missing. Let's transform the vector:
x_new <- addNA(x)
x_new
[1] a NA <NA> <NA>
Levels: <NA> a NA <NA>
Interestingly, the fourth element, i.e. the missing is shown with <NA>
and not with NA
. Further, now the fourth element looks same as the third. And we are told that there are no missings because when we try any(is.na(x_new))
we get FALSE
. At this point I would have thought that the information about what element is the missing (the third or the fourth) is simply lost as it was in x_paste0
. But this is not true because we can actually backtransform x_new
. See:
as.character(x_new)
[1] "a" "NA" "<NA>" NA
How does as.character
know that the third element is "<NA>"
and the fouth is an actual missing, i.e. NA
?
That's probably a uncleanness in the base:::print.factor()
method.
x <- c("a", "NA", "<NA>", NA)
addNA(x)
# [1] a NA <NA> <NA>
# Levels: <NA> a NA <NA>
But:
levels(addNA(x))
# [1] "<NA>" "a" "NA" NA
So, there are no duplicated levels.