Search code examples
razuremachine-learningdata-miningarules

Error deploying azure ML experiment with R script for mining association rules


I have created a new experiment on Azure Machine Learning studio that through the module Execute R Script is able to do the mining of the association rules from the starting dataset. For this experiment I used the R version Microsoft R Open 3.2.2

The function used in the experiment on Azure ML, I first wrote and tested it on R studio, where I did not have any kind of problem. This is the structure of my experiment: experiment

and this is a part of code inserted inside the module on Azure ML that on R Studio works properly:

# Map 1-based optional input ports to variables
dataset1 <- maml.mapInputPort(1) # class: data.frame

library("arules")
library("sqldf")

x <- sqldf('select ID_Ordine, AnnoOrdine, ZonaCommerciale, Modello, SUM(Qta) as Qta 
            from dataset1 group by ID_Ordine, Modello order by ID_Ordine')

a_list1 <- transform(x, Modello = as.factor(Modello),
                     ID_Ordine = as.factor(ID_Ordine)) 
transactions <- as(split(x[,"Modello"], x[,"ID_Ordine"]), "transactions")
rules <- sort(apriori(transactions,
                        parameter = list(supp = 0.1, conf = 0.1, target = "rules",
                                         maxlen = 5)), by="lift")
gi <- generatingItemsets(rules) #remove inverse duplicated rules
d <- which(duplicated(gi))      #remove inverse duplicated rules
rules <- rules[-d]              #remove inverse duplicated rules

#create a dataframe to be used as output
result <- data.frame(label_lhs = labels(lhs(rules)), 
                     label_rhs = labels(rhs(rules)),
                     count = quality(rules)["count"])
               
# Select data.frame to be sent to the output Dataset port
maml.mapOutputPort("result");

If I exclude this line from the code count = quality(rules)["count"] (the statement to import into the output dataframe the column relating to the count) the experiment works correctly, but when I also import the count column, the execution of the experiment gives me the following error: enter image description here

Someone knows how to fix this error, or knows an alternative way to select the count column from the arules object recognized by Azure ML?

Thanks for any suggestions


Solution

  • The countcolumn is not calculated by the function apriori()in this version of the package arules, so I calculated it in this way, using the inverse formula to calculate the support:

    #create a dataframe to be used as output
    result <- data.frame(label_lhs = labels(lhs(rules)), 
                         label_rhs = labels(rhs(rules)),
                         count = quality(rules)$support*length(transactions))
    

    because the support is calculated with the following formula:

    support = (number of transactions with A&B)/(number of total transactions)