Efficient sampling of factor variable from dataframe subsets

I have a dataframe df1 which contains 6 columns, two of which (var1 & var3) I am using to split df1 by, resulting in a list of dataframes ls1.

For each sub dataframe in ls1 I want to sample() x$var2, x$num times with x$probs probabilities as follows:

Create data:

var1 <- rep(LETTERS[seq( from = 1, to = 3 )], each = 6)
var2 <- rep(LETTERS[seq( from = 1, to = 3 )], 6)
var3 <- rep(1:2,3, each = 3)
num <- rep(c(10, 11, 13, 8, 20, 5), each = 3)
probs <- round(runif(18), 2)
df1 <- as.data.frame(cbind(var1, var2, var3, num, probs))
ls1 <- split(df1, list(df1$var1, df1$var3))

have a look at the first couple list elements:

$A.1
  var1 var2 var3 num probs
1    A    A    1  10  0.06
2    A    B    1  10  0.27
3    A    C    1  10  0.23

$B.1
  var1 var2 var3 num probs
7    B    A    1  13  0.93
8    B    B    1  13  0.36
9    B    C    1  13  0.04

lapply over ls1:

ls1 <- lapply(ls1, function(x) { 
  res <- table(sample(x$var2, size = as.numeric(as.character(x$num)), 
    replace = TRUE, prob = as.numeric(as.character(x$probs))))
  res <- as.data.frame(res)
  cbind(x, res = res$Freq)
})
df2 <- do.call("rbind", ls1)
df2

Have a look at the first couple list elements of the result:

$A.1
  var1 var2 var3 num probs res
1    A    A    1  10  0.06   2
2    A    B    1  10  0.27   4
3    A    C    1  10  0.23   4

$B.1
  var1 var2 var3 num probs res
7    B    A    1  13  0.93  10
8    B    B    1  13  0.36   3
9    B    C    1  13  0.04   0

So for each dataframe a new variable res is created, the sum of res equals num and the elements of var2 are represented in res in proportions relating to probs. This does what I want but it becomes very slow when there is a lot of data.

My Question: is there a way to replace the lapply piece of code with something more efficient/faster?

I am just beginning to learn about vectorization and am guessing this could be vectorized? but I am unsure of how to achieve it.

ls1 is eventually returned to a dataframe structure so if it doesn't need to become a list to begin with all the better (although it doesn't really matter how the data is structured for this step).

Any help would be much appreciated.

Solution

First, you should create df1 using data.frame() rather than converting from a matrix, because the matrix forces all data types to the be the same even though you have both numeric and character variables.

df1 <- data.frame(var1, var2, var3, num, probs)

Next, instead of using the sample function, the rmultinom function is much more efficient because it directly outputs the number of draws for each value in x$var2:

ls1 <- lapply(ls1, function(x) { 
    x$res <- rmultinom(1, x$num[1], x$probs)
    x
})

This should be noticeably faster than using the sample approach.