Search code examples
rperformancesubset

Is '['-subsetting by name slower than subsetting by index?


If we have a named array, say a 2-by-3 matrix

amatrix <- cbind(a=1:2, b=3:4, c=5:6)
##      a b c
## [1,] 1 3 5
## [2,] 2 4 6

we can subset a column, say #2, by name or by index:

amatrix[, 'b']
## [1] 3 4
amatrix[, 2]
## [1] 3 4

Which of these two subsetting methods is faster, and by how much? I suspect that name subsetting should be slower, owing to string-matching, and wonder if I should take this into account when subsetting hundreds of thousands of arrays.

One question and its answer interestingly report and explain why subsetting lists by [[ can be faster than by $ and vice versa depending on the context. But I have not found any information regarding the present question about [.


Solution

  • We can do an experiment:

    # long named vector
    v <- setNames(
      1:1e6,
      paste0('V', 1:1e6)
    )
    
    b <- bench::mark(
      index_by_position = v[1000],
      index_by_name = v['V1000'],
      min_time = 10
    )
    plot(b)
    
    # A tibble: 2 × 13
      expression      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result memory    
      <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list> <list>    
    1 index_by_…    412ns    501ns  1594553.        0B     0    10000     0     6.27ms <int>  <Rprofmem>
    2 index_by_…   1.12ms   1.64ms      486.    7.63MB     5.71  2977    35      6.12s <int>  <Rprofmem>
    # ℹ 2 more variables: time <list>, gc <list>
    

    enter image description here

    It appears that indexing by name is substantially slower.

    Playing around a bit, this performance difference:

    • appears to be very similar for a [1, N] matrix,
    • becomes larger as the vector grows.