Search code examples
rsubsetnar-factor

Unexpected return for NA in factor lookup


I have a factor that I'm using as a lookup table.

condLookup = c(hotdog = "ketchup", ham = "mustard", popcorn = "salt", coffee = "cream")

This works as expected - I put in a 3-vector and get a 3-vector back:

condLookup[c("hotdog", "spinach", NA)]
  hotdog      <NA>      <NA> 
"ketchup"       NA        NA 

This too is expected, even tho the returns are all NA:

condLookup[c(NA, "spinach")]
<NA> <NA> 
  NA   NA 

And this:

condLookup["spinach"]
<NA> 
  NA 

But then this surprised me - I gave an atomic NA, or two NA, and I got a named vector of 4 NA's back.

condLookup[NA]
<NA> <NA> <NA> <NA> 
  NA   NA   NA   NA 
condLookup[c(NA, NA)]
<NA> <NA> <NA> <NA> 
  NA   NA   NA   NA 

Apparently, for vector2 <- condLookup[vector1] then vector2 will be the same length as vector1 unless every element in vector1 is NA. In which case vector2 is the same length as condLookup. Can you explain this behavior?


Solution

  • NA values are typed, and the type matters: c(NA,"spinach") coerces NA to character, which isn't recycled:

    condLookup[NA]
    ## <NA> <NA> <NA> <NA> 
    ##   NA   NA   NA   NA 
    
    condLookup[NA_character_]
    ## <NA> 
    ##  NA
    

    The default type of NA is logical. Logical vectors will get recycled to match the length of the vector, while character vectors will be used to match the names of the vector. From ?[:

    Character vectors will be matched to the ‘names’ of the object

    ... ‘i’, ‘j’, ‘...’ can be logical vectors, indicating elements/slices to select. Such vectors are recycled if necessary to match the corresponding extent.