I'm getting in trouble transforming a dataframe object into a transaction object. I create a dataframe grouped by InvoiceNumber and the list of products separated by ',' (the dataframe then contains two columns), everything is ok,
df = read.csv('Orders.csv', sep = ';', stringsAsFactors = T)
df$Document.Date = as.Date(df$Document.Date, format = '%d/%m/%Y')
library(tidyverse)
library(plyr)
grouping_for_AA =
data.frame(
df %>%
group_by(Sales.Document, Material) %>%
dplyr::select(Sales.Document, Material, Document.Date)
)
#Create transaction data building a list of material for each sales doc
#separated by a ,
transactionData = ddply(grouping_for_AA, c('Sales.Document'),
function(df) paste(df$Material,
collapse = ',')
)
but when I use the as(data, 'transactions') function R say me to discretize input, so I use as.factor for the Product list column, but doing this each transaction becomes a factor level and no rules can be mined (clearly).
#set column InvoiceNo of dataframe transactionData
transactionData$Sales.Document <- NULL
#Change name of lists of Materials
colnames(transactionData) = 'Material'
#transform to factor
transactionData = data.frame(lapply(transactionData, factor))
#Create a transaction object: errors can be due to the package containing 'as'
trObj <- as(transactionData, "transactions")
I already tried dataframes in single and basket format, but I could not solve it.
Any Idea on how to transform a dataframe into transaction format without exporting and reloading data?
You can try this, to convert your data.frame
in a transaction dataset. I've added a fake date, but I think it's useless, due you are not using it in your elaboration:
data$Document.Date <- Sys.Date()
data
Sales.Document Material Document.Date
1 1 A 2018-11-21
2 1 B 2018-11-21
3 1 C 2018-11-21
4 2 A 2018-11-21
5 2 C 2018-11-21
6 3 A 2018-11-21
Now exactly your dataset: you can add data.frame()
in the dplyr chain:
library(tidyverse)
library(plyr)
grouping_for_AA <- data %>%
group_by(Sales.Document, Material) %>%
dplyr::select(Sales.Document, Material, Document.Date) %>%
data.frame()
Now you can transform in a transactions data:
library(arules)
library(reshape2)
trans <- as(split(grouping_for_AA[,"Material"], grouping_for_AA[,"Sales.Document"]), "transactions")
inspect(trans)
items transactionID
[1] {A,B,C} 1
[2] {A,C} 2
[3] {A} 3
Lastly, you can apply the apriori()
function:
rules <- apriori(trans, parameter = list(supp = 0.3, conf = 0.3, target="rules", minlen=2))
inspect(rules)
lhs rhs support confidence lift count
[1] {B} => {C} 0.3333333 1.0000000 1.5 1
[2] {C} => {B} 0.3333333 0.5000000 1.5 1
[3] {B} => {A} 0.3333333 1.0000000 1.0 1
[4] {A} => {B} 0.3333333 0.3333333 1.0 1
[5] {C} => {A} 0.6666667 1.0000000 1.0 2
[6] {A} => {C} 0.6666667 0.6666667 1.0 2
[7] {B,C} => {A} 0.3333333 1.0000000 1.0 1
[8] {A,B} => {C} 0.3333333 1.0000000 1.5 1
[9] {A,C} => {B} 0.3333333 0.5000000 1.5 1