I prepared a data set for reading it as transactions using arules package in R. however, one of my data pre-processing is causing an issue when I use the command itemFrequencyplot, specifically, the highest frequency item is " ". Would anyone have any suggestions to resolve this issue?
Original data:
data <- as.data.frame(matrix(NA, nrow = 10, ncol = 3))
colnames(data) <- c("Customer", "OrderDate", "Product")
data$Customer <- c("John", "John", "John", "Tom", "Tom", "Tom", "Sally", "Sally", "Sally", "Sally")
data$OrderDate <- c("1-Oct", "2-Oct", "2-Oct", "2-Oct","2-Oct", "2-Oct", "3-Oct", "3-Oct", "3-Oct", "3-Oct")
data$Product <- c("Milk", "Eggs", "Bread", "Butter", "Eggs", "Milk", "Bread", "Butter", "Eggs", "Wine")
I make the following transformation
library(reshape2)
library(dplyr)
newdata <- data %>%
group_by(Customer, OrderDate) %>%
mutate(ProductValue = paste0("Product", 1:n()) ) %>%
dcast(Customer + OrderDate ~ ProductValue, value.var = "Product") %>%
arrange(OrderDate)
newdata[is.na(newdata)] <- " "
newdata <- newdata[ , 3:6]
newdata[sapply(newdata, is.character)] <- lapply(newdata[sapply(newdata, is.character)], as.factor) #converting is.character columns into as.factor
used write.table to create csv file without column names for reading via arules
write.table(newdata, "transactions.csv", row.names = FALSE, col.names = FALSE, sep = ",")
using arules package to read the csv file as transactions
library(arules)
transactiondata <- read.transactions("transactions.csv", sep = ",", format = "basket")
does not work - throws an error and after reading previous queries on stackoverflow, I was able to resolve it as follows
transactiondata <- read.transactions("transactions.csv", sep = ",", format = "basket", rm.duplicates = TRUE)
itemFrequencyPlot(transactiondata, topN = 5)
the result of this plot has " " as the top frequency item, which in reality is not the case and is a result of my data pre-processing. Suggestions to resolve it would be greatly appreciated!
I would do it this way (following the examples in the manual page for transactions):
data_list <- split(data$Product, paste(data$OrderDate, data$Customer))
trans <- as(data_list, "transactions")
inspect(trans)
items transactionID
[1] {Milk} 1-Oct John
[2] {Bread,Eggs} 2-Oct John
[3] {Butter,Eggs,Milk} 2-Oct Tom
[4] {Bread,Butter,Eggs,Wine} 3-Oct Sally
itemFrequencyPlot(trans, topN = 5)
Hope this helps!