Search code examples
weka

How to create new attribute by combining existing 2 attribute in Weka?


I am new to Weka and found there's only little document on the filter function. Actually, I have 2 attribute i.e profit, cost. I would like to create a new attribute name result which compares profit and cost attributes and , for each row value, result label will be gain if profit > cost, otherwise the label will be 'loss'.

I am using Weka Explorer UI. And I tried Copy and MergeTwoValue filter but seems it can't do the comparison step. What would be the right step?


Solution

  • Such a comparison is possible using the MathExpression filter as part of filter processing pipeline, using the ifelse construct. However, MathExpression does not allow you to use labels, but you can have an indicator label 0 or 1 to indicate whether gain or loss.

    MultiFilter
    |
    +- Add  (we insert a new numeric attribute, all missing values)
    |
    +- ReplaceMissingWithUserConstant (MathExpression skips missing values, hence replacing them in our new attribute)
    |
    +- MathExpression (the actual comparison between the two attributes)
    |
    +- NumericToNominal (to turn the numeric 0/1 values into labels)
    

    I will demonstrate how to construct this pipeline using the bolts UCI dataset, which has the following attributes:

    1 RUN     numeric
    2 SPEED1  numeric
    3 TOTAL   numeric
    4 SPEED2  numeric
    5 NUMBER2 numeric
    6 SENS    numeric
    7 TIME    numeric
    8 T20BOLT numeric
    

    For this example, I want to compare SENS and TIME, creating an indicator whether SENS > TIME.

    MultiFilter

    The MultiFilter instance combines all our sub-filters into a single filter setup. That way you can easily apply, extend it or use it within a FilteredClassifier setup.

    Add

    First, we will add an attribute using the Add filter at index 8, which will push the class attribute to position 9, giving it the name SENS>TIME (you can give it any name you want):

    weka.filters.unsupervised.attribute.Add -N SENS>TIME -C 8
    

    ReplaceMissingWithUserConstant

    Next, we use the ReplaceMissingValueUserConstant filter to replace the missing values in our attribute (index 8) with a dummy value, e.g., -1. This is unfortunately necessary, since MathExpression does not operate on missing values.

    weka.filters.unsupervised.attribute.ReplaceMissingWithUserConstant -A 8 -R -1 -F "yyyy-MM-dd\'T\'HH:mm:ss"
    

    MathExpression

    With the stage set, we can now use MathExpression to fill in our comparison using the expression ifelse(A6>A7,1,0):

    weka.filters.unsupervised.attribute.MathExpression -E ifelse(A6>A7,1,0) -V -R 8
    

    If attribute 6 (SENS) is greater than attribute 7 (TIME), then insert a 1 otherwise a 0.

    NumericToNominal

    With the NumericToNominal filter we will turn the numeric indicators in our comparison attribute into nominal labels:

    weka.filters.unsupervised.attribute.NumericToNominal -R 8
    

    Bonus

    If you want to use the labels gain/loss instead of 1/0, then you can add the RenameNominalValues filter at the end of the pipeline.