I'm creating a loop with combn function in R. My goal is to obtain every combination of 133 variables characters from 2 until 133 combinations from a vector with the names. However, when I start my code, the routine stop and gives me the following statment: "Error: Unable to allocate vector of size 2.4Gb". It happens when the routine starts with five combinations.
There is a condition to reduce the amount of combination that would help to turn the list smaller. I'm desire to keep just combinations that have the following string: "var_pib_nsa_interanual"
Below it's my code:
gc() #garbage collection - clear memory.
memory.limit (9999999999) # increase the memory size.
nome_series <- c("x.consu_energia_brasil_total","x.massa_sal_ampl","x.ibc_br","x.selic_aa","x.base_monetaria","x.ipca_adm_var","x.ipca_livre_var","x.ipca_alim_domic_var","x.cambio_real_efet_ipca","x.tx_media_juros_pf_total","x.utiliza_capac_ocio_fgv","x.pib_mensal","x.ipca_cheio_var_nsa","x.consu_energia_industr","x.consu_energia_residencia","x.consu_energia_comercial","x.economic_conditions","x.house_index,"x.sales_vehicles","x.sales_credit","x.usa_economic","x.Anfavea_Producao_de_automoveis_e_comerciais_leves","x.Anfavea_Producao_de_caminhoes_e_onibus","x.Fenabrave_Licenciamento_de_veiculos_novos","x.Funcex_Exportacao_total","x.Funcex_Exportacao_de_manufaturados","x.Funcex_Importacao_total","x.IAB_Producao_de_aco_bruto","x.capacity_utilization","x.exportssurvey","x.Abraciclo_Producao_de_motocicletas","x.industrial_production","x.manu_production","x.civil","x.prod_agro","x.civil_price","x.services","x.inflation","x.ABAL_Producao_de_aluminio_primario","x.exchangerate","x.consumer_survey","x.Receita_Federal_Arrecadacao_de_IPI","x.Receita_Federal_Arrecadacao_de_IR","x.Receita_Federal_Arrecadacao_IOF","x.BNDES_COM","x.BNDES_TOTAL","x.businesssurvey","x.igp_m_var_mensal","x.interestrate2","x.Funcex_Importacao_de_materias_primas","x.alcool","x.carne_bovina","x.carne_suina","x.carnes_aves","x.celulose","x.minerio_ferro",...)
num_elementos <- 1:5
combinacoes_possiveis <- list()
comb <- list()
for (i in 2:length(num_elementos)) {
combinacoes_possiveis[[i]] <- combn(nome_series,
num_elementos[i],
simplify = FALSE
)
comb[[i]] <- Filter(function(x){"var_pib_nsa_interanual" %in% x},
combinacoes_possiveis[[i]]
)
combinacoes_possiveis[[i]] <- NULL
}
Generally speaking when you are dealing with memory problems, iterators are a great option for getting around this hardware limitation.
For example, the package RcppAlgos
(I am the author) and the package arrangements
offer combinatorial iterators. Instead of generating everything up front and then filtering, you would generate one combination at a time and carry out your check. Something like:
library(RcppAlgos)
iter <- comboIter(nome_series, num_elementos[i])
res <- iter@nextIter()
while (!is.null(res)) {
## Your check here as well as what you
## want to do with the current combination
## if the check is successful
## .
## .
## .
res <- iter@nextIter()
}
And with the package arrangements
we have:
library(arrangements)
iter <- icombinations(nome_series, num_elementos[i])
res <- iter$getnext()
while (!is.null(res)) {
## Same as above
res <- iter$getnext()
}
A better solution is to think a little harder about your actual problem. It looks like you only want combinations that contain a certain value. In these instances, simply leave this value out and generate all combinations of n - 1 choose m - 1, then tack on your value of choice. Remember, with combinations, order doesn't matter, so c(1, 2, 3)
is equivalent to c(2, 1, 3)
.
Below is a small example. Let's say we want only combinations of 5 choose 3 where 2 is in the combination. Using the method outlined by the OP, we have:
Filter(\(x) 2 %in% x, combn(5, 3, simplify = FALSE))
# [[1]]
# [1] 1 2 3
#
# [[2]]
# [1] 1 2 4
#
# [[3]]
# [1] 1 2 5
#
# [[4]]
# [1] 2 3 4
#
# [[5]]
# [1] 2 3 5
#
# [[6]]
# [1] 2 4 5
Now, using the method outlined above, we have:
## If the output doesn't matter, below is a very simple method
combn(c(1, 3, 4, 5), 2, \(x) c(x, 2))
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] 1 1 1 3 3 4
# [2,] 3 4 5 4 5 5
# [3,] 2 2 2 2 2 2
## Or using rbind
rbind(2, combn(c(1, 3, 4, 5), 2))
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] 2 2 2 2 2 2
# [2,] 1 1 1 3 3 4
# [3,] 3 4 5 4 5 5
## Or if the output really needs to be a list
combn(c(1, 3, 4, 5), 2, \(x) c(x, 2), simplify = FALSE)
# [[1]]
# [1] 1 3 2
#
# [[2]]
# [1] 1 4 2
#
# [[3]]
# [1] 1 5 2
#
# [[4]]
# [1] 3 4 2
#
# [[5]]
# [1] 3 5 2
#
# [[6]]
# [1] 4 5 2
Of course, if memory is still an issue, we could use iterators along with this approach. In fact, RcppAlgos::comboIter
offers a FUN
argument similar to combn
which allows you to get the desired result while also keeping memory low:
iter <- comboIter(c(1, 3, 4, 5), 2, FUN = \(x) c(x, 2))
iter@nextIter()
# [1] 1 3 2
## Get more than one at a time
iter@nextNIter(2)
# [[1]]
# [1] 1 4 2
#
# [[2]]
# [1] 1 5 2
## Get the remaining combinations
iter@nextRemaining()
# [[1]]
# [1] 3 4 2
#
# [[2]]
# [1] 3 5 2
#
# [[3]]
# [1] 4 5 2
You can also make use of the FUN.VALUE
argument to get simplified output.
iter <- comboIter(c(1, 3, 4, 5), 2, FUN = \(x) c(x, 2),
FUN.VALUE = c(1, 2, 3))
iter@nextNIter(3)
# [,1] [,2] [,3]
# [1,] 1 3 2
# [2,] 1 4 2
# [3,] 1 5 2