Search code examples
rtransactionscluster-analysisarulesdiscretization

Unsupervised discretization to convert continuous into categorical for frequent item set mining


I am using the Package ‘arules’ to mine frequent itemsets in my big data, but I cannot find suitable methods for discretization.

As the example in Package ‘arules’, several basic unsupervised methods can be used in the function ‘discretization’, but I want to estimate optimal number of categories in my large dataset, it seems more reasonable than assigning the number of categories.

Can you give me good advices for this, thanks.

@Michael Hahsler


Solution

  • I think there is little guidance on this for unsupervised discretization. Look at the histogram for each variable and decide manually. For k-means you could potentially use strategies to find k using internal validation techniques (i.e., elbow method). For supervised discretization there exist methods that will help you decide. Maybe someone else can help here.