I've been digging the questions for 3 days already so finally have a courage to ask here. I have a dataset of 379,584 entries and I want to feed it to "arules" in R
It looks like this
A. If I try to go with the format = "basket", I do the following
sales <- read.csv("sales.csv", sep=";")
s1 <- split(sales$product_id, sales$order_id)
s1 <- unique(s1)
tr <- as(s1, "transactions")
This gives me an error "can not coerce list with transactions with duplicated items"
B. If I go with the format = "single"
tr <- read.transactions("sales.csv",
sep=";", format = "single", cols = c(4,2))
I have the same error "can not coerce list with transactions with duplicated items"
I've already checked the files for duplicates and Excel can't find any. I believe the trouble is trivial but I'm just stuck.
Apparently the unique(s1) is causing some problem to your coding. Is it required?
I'd managed to create the transaction just by hashing out that line.
sales <- structure(list(sku = c(207426L, 207422L, 207424L, 9793L, 33186L,
72406L), product_id = c(15729L, 15725L, 15727L, 15999L, 15983L,
15992L), item_id = 1:6, order_id = c(1L, 1L, 1L, 2L, 2L, 2L)),
.Names = c("sku", "product_id", "item_id", "order_id"),
class = "data.frame", row.names = c(NA, -6L))
s1 <- split(sales$product_id, sales$order_id)
#s1 <- unique(s1)
tr <- as(s1, "transactions")
tr
transactions in sparse format with
2 transactions (rows) and
6 items (columns)
If unique is really required, run this instead:
s1 <- lapply(s1, unique)