I have a large dataset and I'm trying to mining association rules between the variables.
My problem is that I have 160 variables among which I have to look for the association rules and also I have more than 1800 item-sets.
Furthermore my variables are continuous variables. To mining association rules, I usually used the apriori algorithm, but as is well known, this algorithm requires the use of categorical variables.
Does anyone have any suggestions on what kind of algorithm I can use in this case?
A restricted example of my dataset is the following:
ID_Order Model ordered quantity
A.1 typeX 20
A.1 typeZ 10
A.1 typeY 5
B.2 typeX 16
B.2 typeW 12
C.3 typeZ 1
D.4 typeX 8
D.4 typeG 4
...
My goal would be mining association rules and correlation between different products, maybe with a neural network algorithm in R Does anyone have any suggestions on how to solve this problem?
Thanks in advance
You can create transactions from your dataset like this:
library(dplyr)
This function is used to get the transactions per ID_Order
concat <- function(x) {
return(list(as.character(x)))
}
Group df
by ID_Order
and concatenate. pull()
returns the concatenated Model
s in a list.
a_list <- df %>%
group_by(ID_Order) %>%
summarise(concat = concat(Model)) %>%
pull(concat)
Set names to ID_Order
:
names(a_list) <- unique(df$ID_Order)
Then you can use the package arules
:
Get object of transactions
class:
transactions <- as(a_list, "transactions")
Extract rules. You can set minimum support and minimum confidence in supp
and conf
resp.
rules <- apriori(transactions,
parameter = list(supp = 0.1, conf = 0.5, target = "rules"))
To inspect the rules use:
inspect(rules)
And this is what you get:
lhs rhs support confidence lift count
[1] {} => {typeZ} 0.50 0.50 1.0000000 2
[2] {} => {typeX} 0.75 0.75 1.0000000 3
[3] {typeW} => {typeX} 0.25 1.00 1.3333333 1
[4] {typeG} => {typeX} 0.25 1.00 1.3333333 1
[5] {typeY} => {typeZ} 0.25 1.00 2.0000000 1
[6] {typeZ} => {typeY} 0.25 0.50 2.0000000 1
[7] {typeY} => {typeX} 0.25 1.00 1.3333333 1
[8] {typeZ} => {typeX} 0.25 0.50 0.6666667 1
[9] {typeY,typeZ} => {typeX} 0.25 1.00 1.3333333 1
[10] {typeX,typeY} => {typeZ} 0.25 1.00 2.0000000 1
[11] {typeX,typeZ} => {typeY} 0.25 1.00 4.0000000 1