I have been given 2 data sets and want to perform cluster analysis for the sets using KNIME.
Once I have completed the clustering, I wish to carry out a performance comparison of 2 different clustering algorithms.
With regard to performance analysis of clustering algorithms, would this be a measure of time (algorithm time complexity and the time taken to perform the clustering of the data etc) or the validity of the output of the clusters? (or both)
Is there any other angle one look at to identify the performance (or lack of) for a clustering algorithm?
Many thanks in advance,
It depends a lot on what data you have available.
A common way of measuring the performance is with respect to existing ("external") labels (albeit that would make more sense for classification than for clustering). There are around two dozen measures you can use for this.
When using an "internal" quality measure, make sure that it is independent of the algorithms. For example, k-means optimizes such a measure, and will always come out best when evaluating with respect to this measure.