If we have a named array, say a 2-by-3 matrix
amatrix <- cbind(a=1:2, b=3:4, c=5:6)
## a b c
## [1,] 1 3 5
## [2,] 2 4 6
we can subset a column, say #2, by name or by index:
amatrix[, 'b']
## [1] 3 4
amatrix[, 2]
## [1] 3 4
Which of these two subsetting methods is faster, and by how much? I suspect that name subsetting should be slower, owing to string-matching, and wonder if I should take this into account when subsetting hundreds of thousands of arrays.
One question and its answer interestingly report and explain why subsetting lists by [[
can be faster than by $
and vice versa depending on the context. But I have not found any information regarding the present question about [
.
We can do an experiment:
# long named vector
v <- setNames(
1:1e6,
paste0('V', 1:1e6)
)
b <- bench::mark(
index_by_position = v[1000],
index_by_name = v['V1000'],
min_time = 10
)
plot(b)
# A tibble: 2 × 13 expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time result memory <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm> <list> <list> 1 index_by_… 412ns 501ns 1594553. 0B 0 10000 0 6.27ms <int> <Rprofmem> 2 index_by_… 1.12ms 1.64ms 486. 7.63MB 5.71 2977 35 6.12s <int> <Rprofmem> # ℹ 2 more variables: time <list>, gc <list>
It appears that indexing by name is substantially slower.
Playing around a bit, this performance difference: