Search code examples
rarulesmarket-basket-analysis

In arules how to turn a sparse dataframe into transactions?


Hi I have a sparse dataframe of grocery order like this

library(arules)
a_df <- data.frame(
apple = as.factor(c(1,0,0,0,1,1)),
banana = as.factor(c(0,1,1,0,0,0)),
peeler = as.factor(c(1,0,0,0,1,1)))

a_tran = as(a_df, "transactions" )
inspect(a_tran)
rules <- apriori(a_tran, parameter=list(minlen=2, supp=0.5,conf = 0.5))
inspect(rules)

However the result include 0s (the item not ordered) like this: lhs rhs support confidence lift count {banana=0} => {apple=1} 0.5 0.6 1.2 3

How can I ignore the 0s in the dataframe, or transform the dataframe to something like

order 1: apple, peeler
order 2: banana

Thanks.


Solution

  • Here are a few options

    library(magrittr)
    idx <- which(a_df==1, arr.ind = T)
    (lst <- split(names(a_df)[idx[,2]], idx[,1]))
    # $`1`
    # [1] "apple"  "peeler"
    # 
    # $`2`
    # [1] "banana"
    # 
    # $`3`
    # [1] "banana"
    # 
    # $`5`
    # [1] "apple"  "peeler"
    # 
    # $`6`
    # [1] "apple"  "peeler"
    
    rules <- function(x, app=NULL) { 
      x %>% as("transactions") %>% apriori(parameter=list(minlen=2, supp=0.5,conf = 0.5), appearance=app) 
    }
    # use a list without "0"s:
    lst %>% rules %>% inspect
    # filter "0"s afterwards:
    a_df %>% rules %>% subset(!lhs%pin%"0" & !rhs%pin%"0") %>% inspect
    # filter "0"s in apriori:
    a_df %>% rules(list(none = paste(names(a_df), "0", sep="="), default="both")) %>% inspect