Search code examples
weka

Prediction in weka using explorer


Once i have trained and generated a model , as of now from the examples i have seen , we are using a testing set where we have to put values for actual and predicted , is there a way where i can either put this actual column as empty or cannot use it at all when am doing the prediction

if i take with an example , following is my training set

@relation supermarket
@attribute 'department1' { t}
@attribute 'department2' { t}
@attribute 'department3' { t}
@attribute value

and am using a testing set like

 @relation supermarket
@attribute 'department1' { t}
@attribute 'department2' { t}
@attribute 'department3' { t}
@attribute value

and output like

@relation supermarket
@attribute 'department1' { t}
@attribute 'department2' { t}
@attribute 'department3' { t}
@attribute value
@attribute predicted-value
@attribute predicted-margin

My Question is can i either remove value or keep it as empty from testing set


Solution

  • Case 1: Both your training and test set have class labels

    Training:

    @relation
    simple-training
    @attribute
    feature1 numeric
    feature2 numeric
    class string{a,b}
    @data
    1, 2, b
    2, 4, a
    .......
    

    Testing:

    @relation
    simple-testing
    @attribute
    feature1 numeric
    feature2 numeric
    class string{a,b}
    @data
    7, 12, a
    8, 14, a
    .......
    

    In this case, whether you are using k-fold cv or train-test setup, Weka will not take a look at your class labels in the test set. It gets its model from training, blindly apply that on test set and then compares its prediction with the actual class labels in your testing set.

    This is useful if you want to see the performance evaluation of your classifier.

    Case 2: You have class labels for training data but you don't have class labels for testing data.

    Training:

    @relation
        simple-training
        @attribute
        feature1 numeric
        feature2 numeric
        class string{a,b}
        @data
        1, 2, b
        2, 4, a
        .......
    

    Testing:

     @relation
        simple-testing
        @attribute
        feature1 numeric
        feature2 numeric
        class string{a,b}
        @data
        7, 12, ?
        8, 14, ?
        .......
    

    This is very normal since this is what we need to do- apply training model on unseen unlabeled data to label them! In that case simply put ? marks at your testing class labels. After running Weka on this setup you will get the output with these ? marks replaced by the predicted values (you don't need to create any additional column as this will give you error).

    So, in a nutshell- you need to have compatibility in your training and testing data. In testing data if you don't know the value and you want to predict it, then put a ? mark in that column.