Search code examples
rarules

How to prepare transaction data for arules


I've been digging the questions for 3 days already so finally have a courage to ask here. I have a dataset of 379,584 entries and I want to feed it to "arules" in R

It looks like this Stucture is the following A. If I try to go with the format = "basket", I do the following

sales <- read.csv("sales.csv", sep=";")
s1 <- split(sales$product_id, sales$order_id)
s1 <- unique(s1)

tr <- as(s1, "transactions")

This gives me an error "can not coerce list with transactions with duplicated items"

B. If I go with the format = "single"

tr <- read.transactions("sales.csv",
         sep=";", format = "single", cols = c(4,2))

I have the same error "can not coerce list with transactions with duplicated items"

I've already checked the files for duplicates and Excel can't find any. I believe the trouble is trivial but I'm just stuck.


Solution

  • Apparently the unique(s1) is causing some problem to your coding. Is it required?

    I'd managed to create the transaction just by hashing out that line.

    sales <- structure(list(sku = c(207426L, 207422L, 207424L, 9793L, 33186L, 
    72406L), product_id = c(15729L, 15725L, 15727L, 15999L, 15983L, 
    15992L), item_id = 1:6, order_id = c(1L, 1L, 1L, 2L, 2L, 2L)), 
    .Names = c("sku", "product_id", "item_id", "order_id"), 
    class = "data.frame", row.names = c(NA, -6L))
    
    s1 <- split(sales$product_id, sales$order_id)
    #s1 <- unique(s1)
    
    tr <- as(s1, "transactions")
    tr
    
    transactions in sparse format with
     2 transactions (rows) and
     6 items (columns)
    

    If unique is really required, run this instead:

    s1 <- lapply(s1, unique)