I have the following matrix prob.matx
:
prob.matx <- structure(c(0.182, 0.212, 0.364, 0.242, 0.242, 0.152, 0.394,
0.212, 0.364, 0.091, 0.273, 0.273, 0.333, 0.242, 0.364, 0.061,
0.273, 0.333, 0.212, 0.182, 0.303, 0.333, 0.182, 0.182),
dim = c(4L, 6L),
dimnames = list(c("A", "C", "G", "T"),
c("V1", "V2", "V3", "V4", "V5", "V6")))
Which looks like this:
V1 V2 V3 V4 V5 V6
A 0.182 0.242 0.364 0.333 0.273 0.303
C 0.212 0.152 0.091 0.242 0.333 0.333
G 0.364 0.394 0.273 0.364 0.212 0.182
T 0.242 0.212 0.273 0.061 0.182 0.182
And a vector DNA
:
DNA <- c("A", "C", "C", "C", "C", "A")
I would like to extract the matrix values according to the element and position in the DNA
vector, and return a vector. That is:
c(prob.matx[DNA[1], 1], prob.matx[DNA[2], 2], prob.matx[DNA[3], 3],
prob.matx[DNA[4], 4], prob.matx[DNA[5], 5], prob.matx[DNA[6], 6])
[1] 0.182 0.152 0.091 0.242 0.333 0.303
This seems to be very simple, but I struggle to find a one-step function to do that WITHOUT using apply
or for loop.
I think @Darren's answer is good enough in terms of efficiency (unless you have extreme demand on speed). We can actually do a bit different to improve the speed based on @Darren's solution.
Let's think bigger and try harder
prob.matx <- matrix(rnorm(26 * 1000), 26, 10000, dimnames = list(LETTERS, paste0("V", 1:10000)))
DNA <- sample(LETTERS, 100000, replace = TRUE)
bench::mark(
"Darren" = prob.matx[cbind(DNA, colnames(prob.matx))],
"TIC" = prob.matx[cbind(match(DNA, row.names(prob.matx)), 1:ncol(prob.matx))]
)
and we will see
# A tibble: 2 × 13
expression min median itr/s…¹ mem_a…² gc/se…³ n_itr n_gc total…⁴ result
<bch:expr> <bch:tm> <bch:t> <dbl> <bch:b> <dbl> <int> <dbl> <bch:t> <list>
1 Darren 4.14ms 4.57ms 207. 6.69MB 19.3 86 8 415ms <dbl>
2 TIC 1.5ms 2ms 502. 3.09MB 18.9 213 8 424ms <dbl>