Search code examples
rtransactionsdata-conversionaprioriarules

Convert R data.frame column to Arules transactions


Overview:

I need to convert to Arules transactions the following data.frame column (t$Tags):

  1. scala
  2. ios,button,swift3,compiler-errors,null
  3. c#,pass-by-reference,unsafe-pointers
  4. spring,maven,spring-mvc,spring-security,spring-java-config
  5. android,android-fragments,android-fragmentmanager
  6. scala,scala-collections
  7. python-2.7,python-3.x,matplotlib,plot

Since this data is already in basket format and following example 3 in the Arules documentation (https://cran.r-project.org/web/packages/arules/arules.pdf, page 90) I convert the column by doing the following:

######################################################################################################
#Option 1 - converting data.frame as described in the documentation (page 90)
######################################################################################################
## example 3: creating transactions from data.frame
a_df <- data.frame(
  Tags = as.factor(c("scala",
                      "ios, button, swift3, compiler-errors, null",
                      "c#, pass-by-reference, unsafe-pointers",
                      "spring, maven, spring-mvc, spring-security, spring-java-config",
                      "android, android-fragments, android-fragmentmanager",
                      "scala, scala-collections",
                      "python-2.7, python-3.x, matplotlib, plot"))
  )
## coerce
trans3 <- as(a_df, "transactions")
rules <- apriori(trans3, parameter = list(sup = 0.1, conf = 0.5, target="rules",minlen=1))
rules_output <- as(rules,"data.frame")
## Result: 0 rules
######################################################################################################
# Option 2 - reading from a CSV file, which contains exactly the same data
# above without the header and the quotes
######################################################################################################
file = "Test.csv"
trans3 = read.transactions(file = file, sep = ",", format = c("basket"))
rules <- apriori(trans3, parameter = list(sup = 0.1, conf = 0.5, target="rules",minlen=1))
rules_output <- as(rules,"data.frame")
## Result: 198 rules

Option 1 - result = 0 rules

Option 2 - result = 198 rules


Question:

In my current task and environment I cannot afford to save the data.frame columns to formatted flat files (CSV or any other) and then re-read with read.transactions (Translate Option 1 into Option 2). How do I convert the data.frame column in the correct format in order to properly use the Arules apriori algorithm?


Solution

  • Have a look at the examples in ? transactions. You need a list with vectors of items (item labels) and not a data.frame.

    items <- strsplit(as.character(a_df$Tags), ", ")
    trans3 <- as(items, "transactions")
    
    rules <- apriori(trans3, parameter = list(sup = 0.1, conf = 0.5, target="rules",minlen=1))
    Apriori
    
    Parameter specification:
     confidence minval smax arem  aval originalSupport maxtime support minlen maxlen
            0.5    0.1    1 none FALSE            TRUE       5     0.1      1     10
     target   ext
      rules FALSE
    
    Algorithmic control:
     filter tree heap memopt load sort verbose
        0.1 TRUE TRUE  FALSE TRUE    2    TRUE
    
    Absolute minimum support count: 0 
    
    set item appearances ...[0 item(s)] done [0.00s].
    set transactions ...[22 item(s), 7 transaction(s)] done [0.00s].
    sorting and recoding items ... [22 item(s)] done [0.00s].
    creating transaction tree ... done [0.00s].
    checking subsets of size 1 2 3 4 5 done [0.00s].
    writing ... [198 rule(s)] done [0.00s].
    creating S4 object  ... done [0.00s].