Search code examples
rarules

prepare an arules transaction list


arules requires a list of transactions. each row in the list will contain an array of products. not every transaction has the same amount of products. it sounds like pivot but it's not. an example can be found here

i want something like aggregate(dvd , by=list("ID"), FUN=c) fail with arguments must have same length

this is my data

> dvd
   ID          Item
1   1   Sixth Sense
2   1         LOTR1
3   1 Harry Potter1
4   1    Green Mile
5   1         LOTR2
6   2     Gladiator
7   2       Patriot
8   2    Braveheart
9   3         LOTR1
10  3         LOTR2
11  4     Gladiator
12  4       Patriot
13  4   Sixth Sense
14  5     Gladiator
15  5       Patriot
16  5   Sixth Sense
17  6     Gladiator
18  6       Patriot
19  6   Sixth Sense
20  7 Harry Potter1
21  7 Harry Potter2
22  8     Gladiator
23  8       Patriot
24  9     Gladiator
25  9       Patriot
26  9   Sixth Sense
27 10   Sixth Sense
28 10          LOTR
29 10     Galdiator
30 10    Green Mile

i need a list that looks like that

TR1     c("Sixth Sense","LOTR1","Harry Potter1","Green Mile","LOTR2")
TR2     c("Gladiator","Patriot","Braveheart")
TR3     c("LOTR1","LOTR2")
....

Solution

  • I think split will do the job for you.

        DF <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 
    4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 8L, 8L, 9L, 9L, 9L, 
    10L, 10L, 10L, 10L), Item = c("   Sixth Sense", "         LOTR1", 
    " Harry Potter1", "    Green Mile", "         LOTR2", "     Gladiator", 
    "       Patriot", "    Braveheart", "         LOTR1", "         LOTR2", 
    "     Gladiator", "       Patriot", "   Sixth Sense", "     Gladiator", 
    "       Patriot", "   Sixth Sense", "     Gladiator", "       Patriot", 
    "   Sixth Sense", " Harry Potter1", " Harry Potter2", "     Gladiator", 
    "       Patriot", "     Gladiator", "       Patriot", "   Sixth Sense", 
    "   Sixth Sense", "          LOTR", "     Galdiator", "    Green Mile"
    )), .Names = c("ID", "Item"), class = "data.frame", row.names = c(NA, 
    -30L))
    
        DF <- read.csv(textConnection(txt), header = TRUE, stringsAsFactors = FALSE, strip.white = TRUE)
    result <- split(DF$Item, DF$ID)
    names(result) <- gsub("(.*)", "TR\\1", names(result))
    result
    ## $TR1
    ## [1] "Sixth Sense"   "LOTR1"         "Harry Potter1" "Green Mile"    "LOTR2"        
    ## 
    ## $TR2
    ## [1] "Gladiator"  "Patriot"    "Braveheart"
    ## 
    ## $TR3
    ## [1] "LOTR1" "LOTR2"
    ## 
    ## $TR4
    ## [1] "Gladiator"   "Patriot"     "Sixth Sense"
    ## 
    ## $TR5
    ## [1] "Gladiator"   "Patriot"     "Sixth Sense"
    ## 
    ## $TR6
    ## [1] "Gladiator"   "Patriot"     "Sixth Sense"
    ## 
    ## $TR7
    ## [1] "Harry Potter1" "Harry Potter2"
    ## 
    ## $TR8
    ## [1] "Gladiator" "Patriot"  
    ## 
    ## $TR9
    ## [1] "Gladiator"   "Patriot"     "Sixth Sense"
    ## 
    ## $TR10
    ## [1] "Sixth Sense" "LOTR"        "Galdiator"   "Green Mile"