I am using the Package ‘arules’ to mine frequent itemsets in my big data, but I cannot find suitable methods for discretization.
As the example in Package ‘arules’, several basic unsupervised methods can be used in the function ‘discretization’, but I want to estimate optimal number of categories in my large dataset, it seems more reasonable than assigning the number of categories.
Can you give me good advices for this, thanks.
I think there is little guidance on this for unsupervised discretization. Look at the histogram for each variable and decide manually. For k-means you could potentially use strategies to find k using internal validation techniques (i.e., elbow method). For supervised discretization there exist methods that will help you decide. Maybe someone else can help here.