Search code examples
raprioriarules

Remove column labels from a transaction object


I have a data frame df like below:

df <- data.frame(V1 = c("Prod1", "Prod2", "Prod3"),
                 V2 = c("Prod3", "Prod1", "Prod2"), 
                 V3 = c("Prod2", "Prod1", "Prod3"), 
                 City = c("City1", "City2", "City3"))

When I convert this to transaction class, using the code:

tData <- as(df, "transactions")
inspect(tData)

I get a result like below:

    items                                   transactionID
[1] {V1=Prod1,V2=Prod3,V3=Prod2,City=City1} 1            
[2] {V1=Prod2,V2=Prod1,V3=Prod1,City=City2} 2            
[3] {V1=Prod3,V2=Prod2,V3=Prod3,City=City3} 3   

This means that I have V1=Prod1 and V2=Prod1 as separate products when they are actually the same. This is giving me problems when I use this for apriori algorithm.

How can I remove the column labels so that I get the transaction object as:

    items                                   transactionID
[1] {Prod1,Prod3,Prod2,City1} 1            
[2] {Prod2,Prod1,Prod1,City2} 2            
[3] {Prod3,Prod2,Prod3,City3} 3         

Please help.


Solution

  • You have a somewhat strange data format (with exactly the same number of items in each transaction). To convert this correctly you cannot use a data.frame, but you need a list of transactions.

    library("arules")
    
    df <- data.frame(
      V1 = c("Prod1", "Prod2", "Prod3"),
      V2 = c("Prod3", "Prod1", "Prod2"), 
      V3 = c("Prod2", "Prod1", "Prod3"), 
      City = c("City1", "City2", "City3"))
    
    m <- as.matrix(df)
    l <- lapply(1:nrow(m), FUN = function(i) (m[i, ]))
    

    This is the list format with each transaction as a list element.

    l
    [[1]]
         V1      V2      V3    City 
    "Prod1" "Prod3" "Prod2" "City1" 
    
    [[2]]
         V1      V2      V3    City 
    "Prod2" "Prod1" "Prod1" "City2" 
    
    [[3]]
         V1      V2      V3    City 
    "Prod3" "Prod2" "Prod3" "City3" 
    

    Now it can be coerced into transations

    trans <- as(l, "transactions")
    inspect(trans)
    
        items                    
    [1] {City1,Prod1,Prod2,Prod3}
    [2] {City2,Prod1,Prod2}      
    [3] {City3,Prod2,Prod3} 
    

    You have some duplicate items in the transactions and these are removed.