Search code examples
rmatrixvector

Extract matrix value according to vector elements and their positions, and returns a vector


I have the following matrix prob.matx:

prob.matx <- structure(c(0.182, 0.212, 0.364, 0.242, 0.242, 0.152, 0.394, 
                         0.212, 0.364, 0.091, 0.273, 0.273, 0.333, 0.242, 0.364, 0.061, 
                         0.273, 0.333, 0.212, 0.182, 0.303, 0.333, 0.182, 0.182), 
                       dim = c(4L, 6L), 
                       dimnames = list(c("A", "C", "G", "T"), 
                                       c("V1", "V2", "V3", "V4", "V5", "V6")))

Which looks like this:

     V1    V2    V3    V4    V5    V6
A 0.182 0.242 0.364 0.333 0.273 0.303
C 0.212 0.152 0.091 0.242 0.333 0.333
G 0.364 0.394 0.273 0.364 0.212 0.182
T 0.242 0.212 0.273 0.061 0.182 0.182

And a vector DNA:

DNA <- c("A", "C", "C", "C", "C", "A")

I would like to extract the matrix values according to the element and position in the DNA vector, and return a vector. That is:

c(prob.matx[DNA[1], 1], prob.matx[DNA[2], 2], prob.matx[DNA[3], 3], 
  prob.matx[DNA[4], 4], prob.matx[DNA[5], 5], prob.matx[DNA[6], 6])

[1] 0.182 0.152 0.091 0.242 0.333 0.303

This seems to be very simple, but I struggle to find a one-step function to do that WITHOUT using apply or for loop.


Solution

  • I think @Darren's answer is good enough in terms of efficiency (unless you have extreme demand on speed). We can actually do a bit different to improve the speed based on @Darren's solution.


    Let's think bigger and try harder

    prob.matx <- matrix(rnorm(26 * 1000), 26, 10000, dimnames = list(LETTERS, paste0("V", 1:10000)))
    DNA <- sample(LETTERS, 100000, replace = TRUE)
    
    bench::mark(
      "Darren" = prob.matx[cbind(DNA, colnames(prob.matx))],
      "TIC" = prob.matx[cbind(match(DNA, row.names(prob.matx)), 1:ncol(prob.matx))]
    )
    

    and we will see

    # A tibble: 2 × 13
      expression      min  median itr/s…¹ mem_a…² gc/se…³ n_itr  n_gc total…⁴ result
      <bch:expr> <bch:tm> <bch:t>   <dbl> <bch:b>   <dbl> <int> <dbl> <bch:t> <list>
    1 Darren       4.14ms  4.57ms    207.  6.69MB    19.3    86     8   415ms <dbl>
    2 TIC           1.5ms     2ms    502.  3.09MB    18.9   213     8   424ms <dbl>