Search code examples
cluster-analysiswekadata-miningk-meansrapidminer

RapidMiner and WEKA : Different clustering result


I am new in Data Mining analytic and Machine Learning. I have been trying to compare the use of Predictive analysis and Clustering analysis using RapidMiner and Weka for my college assignment.

Just after I study the advantages and disadvantages from both tools and starting to do the analyzing process I found some problems. I tried doing Clustering using K-means and simpleKmeans for Weka and Regression analysis using LinearRegression and I am not quite satisfied with the result, since they contain result that significantly different. all of that I used a same datasets. numerical datasets.

I have been spending a lot of my time trying to figure something out by studying the initialization for each algorithm each tools since the interface is different and there are some parameter that is on RapidMiner but not in Weka or otherwise, so I am a bit confused. (is it the problem?)

Despite that what do you think is wrong? is there some initialization process that I missed? or is it because the code is different in each tools even they use the same algorithm?

Thank you for your answer!


Solution

  • Weka often uses built-in normalization at least in k-means and other algorithms.

    Make sure you have disabled this if you want to make results comparable.

    Also understand that k-means is a randomized algorithm. Different results even from the same package are to be expected (and desirable).