Search code examples
arules

arules CBA with ordered factor


I am trying to run CBA() classifier (package=arulesCBA) over a dataframe consisting of binary variables (e.g. man/woman) and ordered factors (e.g. age group 1 - 5). While putting the data.frame with variables as factors and ordered factors as data to CBA(), I get an error:

> Error in discretizeDF.supervised(formula, data, method = disc.method) : 
>  Cannot discretize non-numeric column: GENAGEGROUPQ2Q8_2Q8_3Q8_4FAM_1FAM_2

When I coerce the data.frame to transactions:

> trans <- transactions(my.dataframe)

...CBA() works nicely but seems that the information about the "order" in ordered factor is lost. Is there a workaround to keep the information about the order of levels in the ordered factors? Perhaps to treat them as integer (as in the example with iris data)?

Many thanks! Zdenek Skala


Solution

  • Unfortunately, this is true. The concept of items in association rule mining does not preserve order information between items. Using numbers as a workaround does not help much. The numbers will be discretized into buckets and the buckets are again encoded as items without order information.