Background - I want to try and exhaustively search a set of all possible combinations of 250 rows taken 10 at a time. In order to iteratively get this, I use the following code
`
## Function definition
gen.next.cbn <- function(cbn, n){
## Generates the combination that follows the one provided as input
cbn.bin <- rep(0, n)
cbn.bin[cbn] <- 1
if (tail(cbn.bin, 1) == 0){
ind <- tail(which(cbn.bin == 1), 1)
cbn.bin[c(ind, ind+1)] <- c(0, 1)
}else{
ind <- 1 + tail(which(diff(cbn.bin) == -1), 1)
nb <- sum(cbn.bin[-c(1:ind)] == 1)
cbn.bin[c(ind-1, (n-nb+1):n)] <- 0
cbn.bin[ind:(ind+nb)] <- 1
}
cbn <- which(cbn.bin == 1)
}
## Example parameters
n <- 40
k <- 10
## Iteration example
for (i in 1:choose(n, k)){
if (i == 1){
cbn <- 1:k
}else{
cbn <- gen.next.cbn(cbn, n)
}
print(cbn)
}
`
I get the error "cannot allocate vector of size n GB" when I go beyond 40 rows.
Ideal Solution: a) If the combinations can be dumped and memory can be flushed iteratively after every run in the loop (where I can check the further conditions) b) If the combinations can be dumped to a csv file such that it does not cause a memory hog.
Thanks for your support.
As I said in the comments, iterpc
is the way to go for such a task. You first need to initialize an iterator via the iterpc
function. Next we can generate the next n
combinations via getnext
. After this, we simply append our results to a csv
(or any file type you like).
getComboChunks <- function(n, k, chunkSize, totalCombos, myFile) {
myIter <- iterpc(n, k)
## initialized myFile
myCombs <- getnext(myIter, chunkSize)
write.table(myCombs, file = myFile, sep = ",", col.names = FALSE)
maxIteration <- (totalCombos - chunkSize) %/% chunkSize
for (i in 1:maxIteration) {
## get the next "chunkSize" of combinations
myCombs <- getnext(myIter, chunkSize)
## append the above combinations to your file
write.table(myCombs, file = myFile, sep = ",",
col.names = FALSE , append = TRUE)
}
}
For example, getComboChunks(250, 10, 100, 1000, "myCombos.csv")
will write out 1000 combinations of 250 choose 10 to the file myCombos.csv
100 combinations at a time. Doing this in chunks will be more efficient than one at a time.
This library is written in C/C++
so it should be fairly efficient, but as @Florian points out in the comments, it won't produce all gmp::chooseZ(250, 10) = Big Integer ('bigz') : [1] 219005316087032475
combinations any time soon. I haven't tested it, but if you settle for 200 choose 5, I think you will be able to produce it in under a day (it is just over 2.5 billion results).