Search code examples
cluster-analysisdata-mininggaussianelki

how to choose the delta value in EM clustering in ELKI


What should we choose the value of delta in EM clustering?

It gives different values of the measures for different values of delta.


Solution

  • The delta parameter in EM is necessary to detect convergence. Since EM uses soft assignments internally, it will continue updating the values to arbitrary digits (technically, it will eventually run out of precision, and stop). As long as you choose a small enough value, you should be fine.

    However, EM is initialized randomly. You have different options for initialization, but it is a best practise to start with a randomized initialization. Running EM multiple times and keeping only the best result is a feasible way to reduce the chance of finding a local optimum only.

    Therefore, it is not at all surprising you get different results. In fact, you should be seeing different results with the same delta, too...

    See also: Wikipedia EM clustering