Search code examples
classificationcluster-analysiswekadata-mining

Is it possible using numeric attribute as class for K-means clustering?


@attribute CustomerID       NUMERIC
@attribute Age              {A,B,C,D,E,F,G,H,I,J,K}
@attribute Region           {A,B,C,D,E,F,G,H}
@attribute ProductSubClass  NUMERIC
@attribute ProductID        NUMERIC 
@attribute Quantity         NUMERIC
@attribute Cost             NUMERIC
@attribute sales            NUMERIC

@data
00141833,F,F,130207,4710105011011,2,44,52
01376753,E,E,110217,4710265849066,1,150,129
01603071,E,G,100201,4712019100607,1,35,39
01738667,E,F,530105,4710168702901,1,94,119

above is header and a protion of trianing dataset training.arff file I want to use Kmeans clustering and J48 classifier, and I can do it without any problems. and flowing is my test dataset test.arff

@attribute CustomerID       NUMERIC
@attribute Age              {A,B,C,D,E,F,G,H,I,J,K}
@attribute Region           {A,B,C,D,E,F,G,H}
@attribute ProductSubClass  NUMERIC
@attribute ProductID        INTEGER
@attribute Quantity         NUMERIC
@attribute Cost             NUMERIC
@attribute sales            NUMERIC

@data
1754698,H,A,560402,?,1,676,849
1027365,F,C,530404,?,1,170,219
956710,E,E,500303,?,1,36,59

In both case I ensured ProductID is selected as Class

here is the steps I did

Setp1: assigning "AddCluster" to use K-means clusterig for each instance in the dataset 
step2: and then using J48 classificaion algorithm to evaluate the performance of the clustering algorithms using 10-fold cross validation option 
Step3: save Finalized Model and close weka (I am closing to test if I can relode and use it agian)
Step4:Load the Model in weaka (Useing "Load Model")
step5: This time I am selecting "supplied test set"  and select test file to predict (which is same formate as I mentioned in the questien above)
step6: I am trying "Re-evaluate model on  current test set" 

But here I am getting a notificaion "Data used to train mode test set are not compatible.would you like to automiatically wrap the classifier in an "inputMappedClassifier before proceeding ?"" If I click "NO" it shows "Train and test set are not compatible ... 5 != 6" and if "YES" it gives following output inthe plainText:

=== Predictions on user test set ===

    inst#     actual  predicted error prediction
        1          ?      0              ? 
        2          ?      0              ? 
        3          ?      0              ? 
        4          ?      0              ? 
        5          ?      0              ? 
        6          ?      0              ? 
        7          ?      0              ? 
        8          ?      0              ? 
        9          ?      0              ? 
       10          ?      0              ? 
       11          ?      0              ? 
       12          ?      0              ? 
       13          ?      0              ? 
       14          ?      0              ? 
       15          ?      0              ? 
       16          ?      1              ? 
       17          ?      0              ? 
       18          ?      0              ? 
       19          ?      0              ? 
       20          ?      0              ? 
       21          ?      0              ? 

Now 1. Is it possible to Using Numeric field ProductID as a class because I have to predict customer choice of product based on ProductID under consideration other attributes.

  1. If So, I am encountered another problem Train and test set are not compatible is there any connection of this error to choosing Numeric attribute ?

NOTE: I am using Weka 3.8.1 GUI


Solution

  • Possibly, your test dataset is missing the cluster-id that the K-Means clustering operation might have added to the training set (Did you tell Weka to do so?), but did not not add to the test data set.

    Aside from that, the whole point of K-Means is to use it for clustering and not for classification.

    So frankly, you are applying things incorrectly, not giving us readers enough information (J48?), and asking (at least) two questions here.