Search code examples
rdataframemodel-associationsaprioriarules

How how to turn arules apriori output into dataframe in R


I have the following dataframe - CTVU.

MMGID_5    EMAIL
2341       [email protected]
50         [email protected]
311        [email protected]
2341       [email protected]
2387       [email protected]
57         [email protected]
2329       [email protected]
2026       [email protected]
650        [email protected]
2369       [email protected]

I want to turn the rules created below, back into a dataframe with two new columns that contain the item with the highest confidence in the first column and the confidence in the second.

library(arules)
library(arulesViz)

CTVU <- read.csv("CTVU.csv", header = TRUE)
CTVU <- unique(CTVU[ , c(2,5) ])
CTVU <- as(split(CTVU[,"MMG5_ID"], CTVU[,"EMAIL"]), "transactions")
itemFrequencyPlot(CTVU,topN=20,type="absolute")
rules <- apriori(CTVU, parameter = list(supp = 0.001, conf = 0.1))
options(digits=2)
inspect(rules[1:5])
rules<-sort(rules, by="confidence", decreasing=TRUE)
rules <- apriori(CTVU, parameter = list(supp = 0.001, conf = 0.8,maxlen=3))

rules<-apriori(data=CTVU, parameter=list(supp=0.001,conf = 0.01,minlen=2),
appearance = list(default="rhs",lhs="289"),
control = list(verbose=F))
rules<-sort(rules, decreasing=TRUE,by="confidence")
inspect(rules[1:5])

So in the end I have a dataframe that looks like this:

EMAIL      MMG5_rule   Confidence
[email protected]  50          0.5
[email protected]  2341        0.2
[email protected]  2026        0.6

I did some research but wasn't able to find a solution. Can someone help me figure out how to do this?


Solution

  • You don't need to turn your arules output into a data.frame. If you have a new customer with a list of bought items, you can find relevant association rules with arules::subset:

    newCustomer <- c("toothbrush", "chocolate", "gummibears")
    arules::subset(aprioriResults, subset = lhs %in% newCustomer)
    

    More info on that in the subset help:

    subset works on the rows/itemsets/rules of x. The expression given in subset will be evaluated using x, so the items (lhs/rhs/items) and the columns in the quality data.frame can be directly referred to by their names.

    Important operators to select itemsets containing items specified by their labels are %in% (select itemsets matching any given item), %ain% (select only itemsets matching all given item) and %pin% (%in% with partial matching).

    However, the question what a customer is likely to buy next is -- in my view -- more of a question to be answered using sequence mining. Luckily, arulesSequences is a package doing that, and it's by the same authors, so little extra work is required.