I want to calculate the lift for (previously defined / old) itemsets based on a list of new transactions. This can be done with the interestMeasure function.
quality(old_itemsets)$lift_ref <- interestMeasure(old_itemsets,"lift",transactions = TransMat_ref, reuse = FALSE)
The problem is: This doesn't work properly. I know this, because I have some itemsets consisting of only a single item. When calculating the lift in the new transactions, for these single items the lift should be equal to one but it is not!
I believe the problem might be in my pre-processing. The transactions I use for generating the itemsets and the new trensactions do not contain exactly the same items. Hence I added the the items missing in one list to the other and vice versa. Here's an example how it's done in one direction.
OldNames <- colnames(TransMat_old)
ReferenceNames <- colnames(TransMat_ref)
SetDiffNames <- setdiff(ReferenceNames, OldNames)
ItemsToAdd <- matrix(data = FALSE, nrow = length(TransMat_old), ncol = length(SetDiffNames))
colnames(ItemsToAdd) <- SetDiffNames
TransMat_old <- merge(TransMat_old, ItemsToAdd)
As I wrote above, I do this twice, so that both transaction matrices contain all items. The problem is: The missing items are just added as additional columns which means that they are not in the same order for the two matrices!
Could that be the reason my interestMeasure
at the top does not work?
Thanks in advance!
library(arules)
#create transactions
data <- paste(
"item1, item2, item3",
"item1, item3",
"item1, item2",
sep="\n")
cat(data)
write(data, file = "TransMat_Old")
data <- paste(
"item2, item3, item4",
"item3, item4",
"item2, item4",
"item2",
sep="\n")
cat(data)
write(data, file = "TransMat_New")
# load transactions
TransMat_Old <- read.transactions("TransMat_Old", format = "basket", sep=",")
TransMat_New <- read.transactions("TransMat_New", format = "basket", sep=",")
# Here's my function for adding
SameItems <- function(TransMat_Old, TransMat_New){
OldNames <- colnames(TransMat_Old)
NewNames <- colnames(TransMat_New)
SetDiffNames <- setdiff(NewNames, OldNames)
ItemsToAdd <- matrix(data = FALSE, nrow = length(TransMat_Old), ncol = length(SetDiffNames))
colnames(ItemsToAdd) <- SetDiffNames
TransMat_Data_allItems <- merge(TransMat_Old, ItemsToAdd)
return(TransMat_Data_allItems)
}
# Add items from one matrix to the other and vice versa
Combined1 <- SameItems(TransMat_Old, TransMat_New)
Combined2 <- SameItems(TransMat_New, TransMat_Old)
# Find itemsets in the old matrix
itemsets <- apriori(data=Combined1, parameter=list(supp=0.1, maxlen=2, target="frequent itemsets"))
inspect(itemsets)
#Calculate Lift for the itemsets
quality(itemsets)$lift_oldSet <- interestMeasure(itemsets,"lift", transactions = Combined1, reuse = FALSE)
#Calculate lift for old itemsets based on the new transaction matrix
quality(itemsets)$lift_newSet <- interestMeasure(itemsets,"lift", transactions = Combined2, reuse = FALSE)
#Single-item-itemsets should have a lift of 1. But they have not.
inspect(itemsets)
As mentioned above: single-item-itemsets should have a lift of 1 in the new dataset. But they have not.
Just get all item labels and recode the transaction sets.
all_item_labels <- union(itemLabels(TransMat_New),itemLabels(TransMat_Old))
TransMat_Old <- recode(TransMat_Old, itemLabels = all_item_labels)
TransMat_New <- recode(TransMat_New, itemLabels = all_item_labels)
Now both transaction sets have the same items in the same order and are compatible with each other.