Search code examples
rlistcombinations

How to work with less combinations of variables in R combn function?


I'm creating a loop with combn function in R. My goal is to obtain every combination of 133 variables characters from 2 until 133 combinations from a vector with the names. However, when I start my code, the routine stop and gives me the following statment: "Error: Unable to allocate vector of size 2.4Gb". It happens when the routine starts with five combinations.

There is a condition to reduce the amount of combination that would help to turn the list smaller. I'm desire to keep just combinations that have the following string: "var_pib_nsa_interanual"

Below it's my code:



gc() #garbage collection - clear memory.
memory.limit (9999999999) # increase the memory size.

nome_series <- c("x.consu_energia_brasil_total","x.massa_sal_ampl","x.ibc_br","x.selic_aa","x.base_monetaria","x.ipca_adm_var","x.ipca_livre_var","x.ipca_alim_domic_var","x.cambio_real_efet_ipca","x.tx_media_juros_pf_total","x.utiliza_capac_ocio_fgv","x.pib_mensal","x.ipca_cheio_var_nsa","x.consu_energia_industr","x.consu_energia_residencia","x.consu_energia_comercial","x.economic_conditions","x.house_index,"x.sales_vehicles","x.sales_credit","x.usa_economic","x.Anfavea_Producao_de_automoveis_e_comerciais_leves","x.Anfavea_Producao_de_caminhoes_e_onibus","x.Fenabrave_Licenciamento_de_veiculos_novos","x.Funcex_Exportacao_total","x.Funcex_Exportacao_de_manufaturados","x.Funcex_Importacao_total","x.IAB_Producao_de_aco_bruto","x.capacity_utilization","x.exportssurvey","x.Abraciclo_Producao_de_motocicletas","x.industrial_production","x.manu_production","x.civil","x.prod_agro","x.civil_price","x.services","x.inflation","x.ABAL_Producao_de_aluminio_primario","x.exchangerate","x.consumer_survey","x.Receita_Federal_Arrecadacao_de_IPI","x.Receita_Federal_Arrecadacao_de_IR","x.Receita_Federal_Arrecadacao_IOF","x.BNDES_COM","x.BNDES_TOTAL","x.businesssurvey","x.igp_m_var_mensal","x.interestrate2","x.Funcex_Importacao_de_materias_primas","x.alcool","x.carne_bovina","x.carne_suina","x.carnes_aves","x.celulose","x.minerio_ferro",...)                                       

num_elementos <- 1:5
combinacoes_possiveis <- list()
comb <- list()


for (i in 2:length(num_elementos)) {
    combinacoes_possiveis[[i]] <- combn(nome_series,
                                        num_elementos[i],
                                        simplify = FALSE
                                        )
    
     comb[[i]] <- Filter(function(x){"var_pib_nsa_interanual" %in% x}, 
                    combinacoes_possiveis[[i]]
                    )
     
     combinacoes_possiveis[[i]] <- NULL
}

Solution

  • Using Iterators

    Generally speaking when you are dealing with memory problems, iterators are a great option for getting around this hardware limitation.

    For example, the package RcppAlgos (I am the author) and the package arrangements offer combinatorial iterators. Instead of generating everything up front and then filtering, you would generate one combination at a time and carry out your check. Something like:

    library(RcppAlgos)
    iter <- comboIter(nome_series, num_elementos[i])
    res <- iter@nextIter()
    
    while (!is.null(res)) {
        ## Your check here as well as what you
        ## want to do with the current combination
        ## if the check is successful
        ## .
        ## .
        ## .
    
        res <- iter@nextIter()
    }
    

    And with the package arrangements we have:

    library(arrangements)
    iter <- icombinations(nome_series, num_elementos[i])
    res <- iter$getnext()
    
    while (!is.null(res)) {
        ## Same as above
        res <- iter$getnext()
    }
    

    Better Solution

    A better solution is to think a little harder about your actual problem. It looks like you only want combinations that contain a certain value. In these instances, simply leave this value out and generate all combinations of n - 1 choose m - 1, then tack on your value of choice. Remember, with combinations, order doesn't matter, so c(1, 2, 3) is equivalent to c(2, 1, 3).

    Below is a small example. Let's say we want only combinations of 5 choose 3 where 2 is in the combination. Using the method outlined by the OP, we have:

    Filter(\(x) 2 %in% x, combn(5, 3, simplify = FALSE))
    # [[1]]
    # [1] 1 2 3
    # 
    # [[2]]
    # [1] 1 2 4
    # 
    # [[3]]
    # [1] 1 2 5
    # 
    # [[4]]
    # [1] 2 3 4
    # 
    # [[5]]
    # [1] 2 3 5
    # 
    # [[6]]
    # [1] 2 4 5
    

    Now, using the method outlined above, we have:

    ## If the output doesn't matter, below is a very simple method
    combn(c(1, 3, 4, 5), 2, \(x) c(x, 2))
    #      [,1] [,2] [,3] [,4] [,5] [,6]
    # [1,]    1    1    1    3    3    4
    # [2,]    3    4    5    4    5    5
    # [3,]    2    2    2    2    2    2
    
    ## Or using rbind
    rbind(2, combn(c(1, 3, 4, 5), 2))
    #      [,1] [,2] [,3] [,4] [,5] [,6]
    # [1,]    2    2    2    2    2    2
    # [2,]    1    1    1    3    3    4
    # [3,]    3    4    5    4    5    5
    
    ## Or if the output really needs to be a list
    combn(c(1, 3, 4, 5), 2, \(x) c(x, 2), simplify = FALSE)
    # [[1]]
    # [1] 1 3 2
    # 
    # [[2]]
    # [1] 1 4 2
    # 
    # [[3]]
    # [1] 1 5 2
    # 
    # [[4]]
    # [1] 3 4 2
    # 
    # [[5]]
    # [1] 3 5 2
    # 
    # [[6]]
    # [1] 4 5 2
    

    Of course, if memory is still an issue, we could use iterators along with this approach. In fact, RcppAlgos::comboIter offers a FUN argument similar to combn which allows you to get the desired result while also keeping memory low:

    iter <- comboIter(c(1, 3, 4, 5), 2, FUN = \(x) c(x, 2))
    iter@nextIter()
    # [1] 1 3 2
    
    ## Get more than one at a time
    iter@nextNIter(2)
    # [[1]]
    # [1] 1 4 2
    # 
    # [[2]]
    # [1] 1 5 2
    
    ## Get the remaining combinations
    iter@nextRemaining()
    # [[1]]
    # [1] 3 4 2
    # 
    # [[2]]
    # [1] 3 5 2
    # 
    # [[3]]
    # [1] 4 5 2
    

    You can also make use of the FUN.VALUE argument to get simplified output.

    iter <- comboIter(c(1, 3, 4, 5), 2, FUN = \(x) c(x, 2),
                      FUN.VALUE = c(1, 2, 3))
    iter@nextNIter(3)
    #      [,1] [,2] [,3]
    # [1,]    1    3    2
    # [2,]    1    4    2
    # [3,]    1    5    2