Search code examples
rarules

convert normal data set to market basket analysis process-able format


I had created one data set as shown below for applying market basket analysis(apriori())

id  name
1   mango
1   apple
1   grapes
2   apple
2   carrot
3   mango
3   apple
4   apple
4   carrot
4   grapes
5   strawberry
6   guava
6   strawberry
6   bananas
7   bananas
8   guava
8   strawberry
8   pineapple
9   mango
9   apple
9   blueberries
10  black grapes
11  pomogranate
12  black grapes
12  pomogranate
12  carrot
12  custard apple

I applied some logic to convert it into market basket analysis process-able data.

library(arules)
fact <- data.frame(lapply(frt,as.factor))
trans <- as(fact, 'transactions') 

and I tried this one also and got an error.

trans1 = read.transactions(file = frt, format = "single", sep = ",",cols=c("id","name"))

Error in scan(file = file, what = "", sep = sep, quiet = TRUE, nlines = 1) : 
  'file' must be a character string or connection

the output that I got is not as expected. output I got.

items                transactionID
1   {name=mango}                   1  
2   {name=apple}                   2  
3   {name=grapes}                  3  
4   {name=apple}                   4  
5   {name=carrot}                  5  
6   {name=mango}                   6  
7   {name=apple}                   7  
8   {name=apple}                   8  
9   {name=carrot}                  9  
10  {name=grapes}                  10 
11  {name=strawberry}              11 
12  {name=guava}                   12 
13  {name=strawberry}              13 
14  {name=bananas}                 14 

my expected output is

id  item
1  {mango,apple,grapes)
2  {apple,carrot}
3  {mango,apple}

and so on like this

so can any one please help a way to get my expected output(if it's possible)

so that it helps me to apply apriori() algorithm.

Thanking you in advance.


Solution

  • If you are doing market basket analysis in arules, you need to construct a transactions. You can do this from your text file like:

    write.csv(frt,file="temp.csv", row.names=FALSE) # say "temp.csv" is your text file
    tranx <- read.transactions(file="temp.csv",format="single", sep=",", cols=c("id","name"))
    inspect(tranx)
    #     items           transactionID
    # 1  {apple,                      
    #     grapes,                     
    #     mango}                    1 
    # 2  {black-grapes}             10
    # 3  {pomogranate}              11
    # 4  {black-grapes,               
    #     carrot,                     
    #     custard-apple,              
    #     pomogranate}              12
    

    ...or, if you have already read your text file into a data.frame, you can coerce it into a transactions through a list object like:

    tranx2 <- list()
    for(i in unique(frt$id)){
      tranx2[[i]] <- unlist(frt$name[frt$id==i])
    }
    
    inspect(as(tranx2,'transactions'))
    
    #   items          
    # 1  {apple,        
    #   grapes,       
    #   mango}        
    # 2  {apple,        
    #   carrot}       
    # 3  {apple,        
    #   mango}        
    # 4  {apple,        
    #   carrot,       
    #   grapes}