I have two factors which I'm using as lookup-tables:
iState <- list("A" = "Alaska", "T" = "Texas", "G" = "Georgia")
sCap <- list("Alaska" = "Juneau", "Texas" = "Austin", "Georgia" = "Atlanta")
And a vector to lookup:
foo <- c("T", "G", "A", "B", NA)
This code chains them together and gives me the lookup I want:
sCap[iState[foo] %>% as.character() %>% na_if("NULL") ] %>% as.character() %>% na_if("NULL")
# [1] "Austin" "Atlanta" "Juneau" NA NA
Is this the most execution-time-efficient way to chain these factors together? Or is there a better way?
You can do a lot better if you use lookup vectors instead of lookup lists. Basically, I changed list
to c()
, and then cut out all the as.character
bits.
vState <- c("A" = "Alaska", "T" = "Texas", "G" = "Georgia")
vCap <- c("Alaska" = "Juneau", "Texas" = "Austin", "Georgia" = "Atlanta")
vCap[vState[foo]]
Benchmarking methods so far:
microbenchmark::microbenchmark(
recode = foo %>%
dplyr::recode(!!!iState, .default = NA_character_) %>%
dplyr::recode(!!!sCap, .default = NA_character_),
lists = sCap[iState[foo] %>% as.character() %>% na_if("NULL") ] %>% as.character() %>% na_if("NULL"),
lists_no_pipe = na_if(as.character(sCap[na_if(as.character(iState[foo]), "NULL")]), "NULL"),
vectors = unname(vCap[vState[foo]])
)
# Unit: microseconds
# expr min lq mean median uq max neval
# recode 227.1 244.05 305.203 268.05 319.55 591.1 100
# lists 182.2 198.85 244.964 222.10 254.20 562.6 100
# lists_no_pipe 11.4 13.25 17.726 15.45 18.70 64.5 100
# vectors 2.5 3.85 5.269 4.90 6.40 12.9 100
If you want things to be as fast as possible, don't use %>%
- it's extra overhead. If you are doing complicated things, the extra microseconds from piping don't really matter. But in this case, the operations being done are already so quick that the few microseconds of piping actually account for a significant percentage of the execution time.
You may be able to go even faster--especially if your look-up tables are large, by using a join to a keyed data.table
instead.