Search code examples
regexrdataframeassociationsarules

R associations rules with arules package - how to split rule formula into vector of elements?


I have a data.frame object which I obtain by converting an object of class rules into data.frame in this way:

trx.cpf.rules.df <- as(trx.cpf.rules, "data.frame")

(You can build the trx.cpf.rules.df object from the structure dputed here).

The head of this data frame looks like this:

> head(trx.cpf.rules.df)
                                                      rules   support confidence     lift
66 {Product_Group_1,Product_Group_49} => {Product_Group_48} 0.1060016  0.7371274 6.683635
12                 {Product_Group_48} => {Product_Group_49} 0.1067810  0.9681979 6.386621
68 {Product_Group_1,Product_Group_23} => {Product_Group_49} 0.1079501  0.9052288 5.971252
16                 {Product_Group_23} => {Product_Group_49} 0.1098987  0.8392857 5.536265
71 {Product_Group_1,Product_Group_23} => {Product_Group_34} 0.1024942  0.8594771 4.702384
19                 {Product_Group_34} => {Product_Group_23} 0.1079501  0.5906183 4.510496

Is there a fast way (dedicated function or sth like that) to convert each of the trx.cpf.rules.df$rules into two vectors contatining relue;s element? For example, for the first row it would be:

> (lhs.el <- c("Product_Group_1", "Product_Group_49"))
[1] "Product_Group_1"  "Product_Group_49"
> (rhs.el <- c("Product_Group_48"))
[1] "Product_Group_48"

Solution

  • This will give you a list structure with lhs/rhs vectors:

    l <- lapply( strsplit(as.character(trx.cpf.rules.df$rules), " => ", fixed = TRUE), function(x) {
      strsplit(  gsub("[{}]", "", x), ",", fixed = TRUE)
    })
    

    To inspect the first rule:

    l[[1]]
    # [[1]]
    # [1] "Product_Group_1"  "Product_Group_49"
    # 
    # [[2]]
    # [1] "Product_Group_48"
    

    To inspect the left-hand-sides of all rules (head):

    head(sapply(l, "[", 1))
    # [[1]]
    # [1] "Product_Group_1"  "Product_Group_49"
    # 
    # [[2]]
    # [1] "Product_Group_48"
    # 
    # [[3]]
    # [1] "Product_Group_1"  "Product_Group_23"
    # 
    # [[4]]
    # [1] "Product_Group_23"
    # 
    # [[5]]
    # [1] "Product_Group_1"  "Product_Group_23"
    # 
    # [[6]]
    # [1] "Product_Group_34"