Search code examples
ralgorithmmachine-learningassociationsarules

Association rules between many continuous variables


I have a large dataset and I'm trying to mining association rules between the variables.

My problem is that I have 160 variables among which I have to look for the association rules and also I have more than 1800 item-sets.

Furthermore my variables are continuous variables. To mining association rules, I usually used the apriori algorithm, but as is well known, this algorithm requires the use of categorical variables.

Does anyone have any suggestions on what kind of algorithm I can use in this case?

A restricted example of my dataset is the following:

ID_Order   Model     ordered quantity
A.1        typeX     20
A.1        typeZ     10
A.1        typeY     5
B.2        typeX     16
B.2        typeW     12
C.3        typeZ     1
D.4        typeX     8
D.4        typeG     4
...

My goal would be mining association rules and correlation between different products, maybe with a neural network algorithm in R Does anyone have any suggestions on how to solve this problem?

Thanks in advance


Solution

  • You can create transactions from your dataset like this:

    library(dplyr)
    

    This function is used to get the transactions per ID_Order

    concat <- function(x) {
      return(list(as.character(x)))
    
    }
    

    Group df by ID_Order and concatenate. pull() returns the concatenated Models in a list.

    a_list <- df %>% 
      group_by(ID_Order) %>% 
      summarise(concat = concat(Model)) %>%
      pull(concat)
    

    Set names to ID_Order:

    names(a_list) <- unique(df$ID_Order)
    

    Then you can use the package arules:

    Get object of transactions class:

    transactions <- as(a_list, "transactions")
    

    Extract rules. You can set minimum support and minimum confidence in supp and conf resp.

    rules <- apriori(transactions, 
                     parameter = list(supp = 0.1, conf = 0.5, target = "rules"))
    

    To inspect the rules use:

    inspect(rules)
    

    And this is what you get:

         lhs              rhs     support confidence lift      count
    [1]  {}            => {typeZ} 0.50    0.50       1.0000000 2    
    [2]  {}            => {typeX} 0.75    0.75       1.0000000 3    
    [3]  {typeW}       => {typeX} 0.25    1.00       1.3333333 1    
    [4]  {typeG}       => {typeX} 0.25    1.00       1.3333333 1    
    [5]  {typeY}       => {typeZ} 0.25    1.00       2.0000000 1    
    [6]  {typeZ}       => {typeY} 0.25    0.50       2.0000000 1    
    [7]  {typeY}       => {typeX} 0.25    1.00       1.3333333 1    
    [8]  {typeZ}       => {typeX} 0.25    0.50       0.6666667 1    
    [9]  {typeY,typeZ} => {typeX} 0.25    1.00       1.3333333 1    
    [10] {typeX,typeY} => {typeZ} 0.25    1.00       2.0000000 1    
    [11] {typeX,typeZ} => {typeY} 0.25    1.00       4.0000000 1