Search code examples
rloopsfor-loopaprioriarules

Loop from separate dataframe to write .CSV in R


I'm trying to run an association rule for a different inputs and print the output in separate csv files. I would like to look up the model input from a separate dataframe and repeat the job until it has reached the last value.

Dataframe CTVU

MMGID_5    EMAIL
2341       [email protected]
50         [email protected]
311        [email protected]
2341       [email protected]
2387       [email protected]
57         [email protected]
2329       [email protected]
2026       [email protected]
650        [email protected]
2369       [email protected]

Here is the model

# Loading packages
library(arules)
library(arulesViz)

# Reading in data
CTVU <- read.csv("CTVU.csv", header = TRUE)
CTVU <- unique(CTVU[ , c(2,5) ])
CTVU <- as(split(CTVU[,"MMG5_ID"], CTVU[,"EMAIL"]), "transactions")

# model
rules<-sort(rules, by="confidence", decreasing=TRUE)
rules <- apriori(CTVU, parameter = list(supp = 0.001, conf = 0.8,maxlen=3))

In instead of manually declaring 2341 in appearance = list(default="rhs",lhs="2341") and changing the name of the file each time a new new variable is declared, I would like to use a loop to run this processes x times.

rules<-apriori(data=CTVU, parameter=list(supp=0.001,conf = 0.01,minlen=2),
appearance = list(default="rhs",lhs="2341"),
control = list(verbose=F))
rules<-sort(rules, decreasing=TRUE,by="confidence")
inspect(rules[1:5])

# create rules into data.frame and write as CSV file
CTVR <- as(rules, "data.frame")
write.csv(CTVR, file = "2341_Basket.csv", row.names = FALSE)

Dataframe MMGID to look loop up from:

MMGID
2341       
50         
311       

Is this possible?


Solution

  • Simply pass a vector of values in loop and that vector would be the MMGID dataframe column that holds the values to be iterated on.

    Below is a lapply() approach which will output corresponding CSVs and create a large list of underlying rules dfs. Also, I do not know name of that column as your post conflates the column name with dateframe name MMGID -fill in Col below as needed:

    # ITERATE THROUGH MMGID COLUMN VALUES 
    rules_dflist <- lapply(MMGID$Col, function(i) {
    
        rules<-apriori(data=CTVU, parameter=list(supp=0.001,conf = 0.01,minlen=2),
        appearance = list(default="rhs",lhs=as.character(i)),
        control = list(verbose=F))
        rules<-sort(rules, decreasing=TRUE,by="confidence")
        inspect(rules[1:5])
    
        # create rules into data.frame and write as CSV file
        CTVR <- as(rules, "data.frame")
        write.csv(CTVR, file = paste0(i,"_Basket.csv"), row.names = FALSE)
        return(CTVR)
    
    })
    
    # NAME EACH ELEMENT TO CORRESPONDING MMGID COL VALUE
    rules_dflist <- setNames(rules_dflist, MMGID$Col)