Wondering if there's a faster way to achieve the following:
I have a vector of max. length long1
which is all zeros.
How do I generate a matrix of all possible combinations of ones in certain positions up to and including the maximum max1s
.
The following works, but seems to be rather inefficient when the return matrix is large e.g. >1e5.
### longest series of zeros
long1 <- 4
### max no. of 1s
max1s <- 2
### get combinations up to long1
f1 <- function(i) t(combinat::combn(seq.int(long1),
m=i, simplify=TRUE))
### list of positions in target matrix to be made 1s
### each list element represents column positions of 1s
l1 <- sapply(1:max1s, f1)
### no. rows in return matrix
nrow1 <- sum(unlist(lapply(l1, nrow)))
### set up matrix of zeros
c2 <- matrix(0L, nrow=nrow1, ncol=long1)
### rows to start at for each 'i' in length(l1) below
nrow2 <- c(1, 1+cumsum(unlist(lapply(l1, nrow))))
for (i in 1:length(l1)){
for (j in 1:nrow(l1[[i]])){
### now iterate over each row in that element of l1
### set relevant position in matrix to 1
c2[nrow2[i]+(j-1), l1[[i]][j, ] ] <- 1L
}}
In this case it's all combinations of 1
s, up to a max of 2, in a vector of length 4:
> c2
[,1] [,2] [,3] [,4]
[1,] 1 0 0 0
[2,] 0 1 0 0
[3,] 0 0 1 0
[4,] 0 0 0 1
[5,] 1 1 0 0
[6,] 1 0 1 0
[7,] 1 0 0 1
[8,] 0 1 1 0
[9,] 0 1 0 1
[10,] 0 0 1 1
I would prefer to avoid using combinat::hcube
then eliminating rows with more than a certain no. of 1
s as this approach will create needlessly large matrices for an application like this.
I guess you could just separately compute the combinations of each size using combn
and then use do.call
with rbind
to combine them all together:
allcombo <- function(long1, max1s) {
do.call(rbind, lapply(1:max1s, function(num1) {
t(apply(combn(long1, num1), 2, function(x) {
col = rep(0, long1)
col[x] = 1
col
}))
}))
}
I've stored your posted solution in function OP
. We can check they return the same values:
all.equal(OP(20, 5), allcombo(20, 5))
# [1] TRUE
Now we can benchmark (there are 21699 returned rows):
library(microbenchmark)
microbenchmark(OP(20, 5), allcombo(20, 5))
# Unit: milliseconds
# expr min lq median uq max neval
# OP(20, 5) 242.4120 256.5791 269.7237 292.7131 556.5984 100
# allcombo(20, 5) 150.4291 179.2588 188.4840 200.9898 448.2214 100
So this approach using combn
is a bit faster (30% on my computer for this set of parameters).