I have the following dataframe - CTVU.
MMGID_5 EMAIL
2341 [email protected]
50 [email protected]
311 [email protected]
2341 [email protected]
2387 [email protected]
57 [email protected]
2329 [email protected]
2026 [email protected]
650 [email protected]
2369 [email protected]
I want to turn the rules created below, back into a dataframe with two new columns that contain the item with the highest confidence in the first column and the confidence in the second.
library(arules)
library(arulesViz)
CTVU <- read.csv("CTVU.csv", header = TRUE)
CTVU <- unique(CTVU[ , c(2,5) ])
CTVU <- as(split(CTVU[,"MMG5_ID"], CTVU[,"EMAIL"]), "transactions")
itemFrequencyPlot(CTVU,topN=20,type="absolute")
rules <- apriori(CTVU, parameter = list(supp = 0.001, conf = 0.1))
options(digits=2)
inspect(rules[1:5])
rules<-sort(rules, by="confidence", decreasing=TRUE)
rules <- apriori(CTVU, parameter = list(supp = 0.001, conf = 0.8,maxlen=3))
rules<-apriori(data=CTVU, parameter=list(supp=0.001,conf = 0.01,minlen=2),
appearance = list(default="rhs",lhs="289"),
control = list(verbose=F))
rules<-sort(rules, decreasing=TRUE,by="confidence")
inspect(rules[1:5])
So in the end I have a dataframe that looks like this:
EMAIL MMG5_rule Confidence
[email protected] 50 0.5
[email protected] 2341 0.2
[email protected] 2026 0.6
I did some research but wasn't able to find a solution. Can someone help me figure out how to do this?
You don't need to turn your arules
output into a data.frame. If you have a new customer with a list of bought items, you can find relevant association rules with arules::subset
:
newCustomer <- c("toothbrush", "chocolate", "gummibears")
arules::subset(aprioriResults, subset = lhs %in% newCustomer)
More info on that in the subset
help:
subset works on the rows/itemsets/rules of x. The expression given in subset will be evaluated using x, so the items (lhs/rhs/items) and the columns in the quality data.frame can be directly referred to by their names.
Important operators to select itemsets containing items specified by their labels are %in% (select itemsets matching any given item), %ain% (select only itemsets matching all given item) and %pin% (%in% with partial matching).
However, the question what a customer is likely to buy next is -- in my view -- more of a question to be answered using sequence mining. Luckily, arulesSequences
is a package doing that, and it's by the same authors, so little extra work is required.