Search code examples
rr-s4apriori

how to extract information from apriori R (association rules)


I am doing some association rules mining in R and want to extract my results so I can build reports my results look like this:

> inspect(rules[1:3])
  lhs          rhs                         support confidence lift
1 {apples} => {oranges}                    0.00029       0.24  4.4
2 {apples} => {pears}                      0.00022       0.18 45.6
3 {apples} => {pineapples} 0.00014         0.12  1.8

How do i extract the "rhs" here i.e. a vector of oranges, pears and pineapples

Further how do I extract information out of the summary i.e.

> summary(rules)

The data type is "s4" and have no problem extracting when the output is in the list etc.. how do you do the equivelant? set of 3 rules

rule length distribution (lhs + rhs):sizes
2 
3 

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      2       2       2       2       2       2 

I want to extract the "3" from the "set of 3 rules"

I have gotten as far as using "@" What does the @ symbol mean in R?

But once i use that, how do i turn my results into a vector i.e.

inspect(rules@rhs)
1 {oranges}
2 {pears}
3 {pineapples}

becomes character vector of length 3


Solution

  • inspect isn't returning anything, just printing its output. When this happens you can use the function capture.output if you want to save the output as a string. For example, getting the rhs

    data(Adult)
    rules <- apriori(Adult, parameter = list(support = 0.4))
    inspect(rules[1:3])
    #   lhs    rhs                              support confidence lift
    # 1 {}  => {race=White}                   0.8550428  0.8550428    1
    # 2 {}  => {native-country=United-States} 0.8974243  0.8974243    1
    # 3 {}  => {capital-gain=None}            0.9173867  0.9173867    1
    
    ## Capture it, and extract rhs
    out <- capture.output(inspect(rules[1:3]))
    gsub("[^{]+\\{([^}]*)\\}[^{]+\\{([^}]*)\\}.*", "\\2", out)[-1]
    # [1] "race=White"                   "native-country=United-States"
    # [3] "capital-gain=None"           
    

    However, it looks like you can just access this information from the rules with the function rhs

    str(rhs(rules)@itemInfo)
    # 'data.frame': 115 obs. of  3 variables:
    #  $ labels   :Class 'AsIs'  chr [1:115] "age=Young" "age=Middle-aged" "age=Senior" "age=Old" ...
    #  $ variables: Factor w/ 13 levels "age","capital-gain",..: 1 1 1 1 13 13 13 13 13 13 ...
    #  $ levels   : Factor w/ 112 levels "10th","11th",..: 111 63 92 69 30 54 65 82 90 91 ...
    

    In general, use str to see what objects are made of so you can decide how to extract components.