java machine-learning classification stanford-nlp text-classification

Stanford Classifier with Real Valued Features

I'd like to use the Stanford Classifier for text classification. My features are mostly textual, but there are some numeric features as well (e.g. the length of a sentence).

I started off with the ClassifierExample and replaced the current features by a simple real valued feature F with value 100 if a stop light is BROKEN and 0.1 otherwise, which results in the following code (apart from the makeStopLights() function in line 10-16, this is just the code of the original ClassifierExample class):

public class ClassifierExample {

    protected static final String GREEN = "green";
    protected static final String RED = "red";
    protected static final String WORKING = "working";
    protected static final String BROKEN = "broken";

    private ClassifierExample() {} // not instantiable

    // the definition of this function was changed!!
    protected static Datum<String,String> makeStopLights(String ns, String ew) {
        String label = (ns.equals(ew) ? BROKEN : WORKING);
        Counter<String> counter = new ClassicCounter<>();
        counter.setCount("F", (label.equals(BROKEN)) ? 100 : 0.1);
        return new RVFDatum<>(counter, label);
    }


    public static void main(String[] args) {
        // Create a training set
        List<Datum<String,String>> trainingData = new ArrayList<>();
        trainingData.add(makeStopLights(GREEN, RED));
        trainingData.add(makeStopLights(GREEN, RED));
        trainingData.add(makeStopLights(GREEN, RED));
        trainingData.add(makeStopLights(RED, GREEN));
        trainingData.add(makeStopLights(RED, GREEN));
        trainingData.add(makeStopLights(RED, GREEN));
        trainingData.add(makeStopLights(RED, RED));
        // Create a test set
        Datum<String,String> workingLights = makeStopLights(GREEN, RED);
        Datum<String,String> brokenLights = makeStopLights(RED, RED);
        // Build a classifier factory
        LinearClassifierFactory<String,String> factory = new LinearClassifierFactory<>();
        factory.useConjugateGradientAscent();
        // Turn on per-iteration convergence updates
        factory.setVerbose(true);
        //Small amount of smoothing
        factory.setSigma(10.0);
        // Build a classifier
        LinearClassifier<String,String> classifier = factory.trainClassifier(trainingData);
        // Check out the learned weights
        classifier.dump();
        // Test the classifier
        System.out.println("Working instance got: " + classifier.classOf(workingLights));
        classifier.justificationOf(workingLights);
        System.out.println("Broken instance got: " + classifier.classOf(brokenLights));
        classifier.justificationOf(brokenLights);
    }

}

In my understanding of linear classifiers, feature F should make the classification task pretty easy - after all, we just need to check whether the value of F is greater than some threshold. However, the classifier returns WORKING on every instance in the test set.

Now my question is: Have I made something wrong, do I need to change some other parts of the code as well for real-valued features to work or is there something wrong with my understanding of linear classifiers?

Solution

Your code looks fine. Note that typically with a Maximum Entropy classifier you provide binary valued features (1 or 0).

Here is some more reading on Maximum Entropy classifiers: http://web.stanford.edu/class/cs124/lec/Maximum_Entropy_Classifiers

Look at slide titled: "Feature-Based Linear Classifiers" to see the specific probability calculation for Maximum Entropy classifiers.

Here is the formula for your example case with 1 feature and 2 classes ("works", "broken"):

probability(c1) = exp(w1 * f1) / total probability(c2) = exp(w2 * f1) / total total = exp(w1 * f1) + exp(w2 * f1)

w1 is the learned weight for "works" and w2 is the learned weight for "broken"

The classifier selects the higher probability. Note that f1 = (100 or 0.1) your feature value.

If you consider your specific example data, since you have (2 classes, 1 feature, feature is always positive), it is not possible to build a maximum entropy classifier that will separate that data, it will always guess all one way or the other.

For sake of argument say w1 > w2.

Say v > 0 is your feature value (either 100 or 0.1).

Then w1 * v > w2 * v, thus exp(w1 * v) > exp(w2 * v), so you'll always assign more probability to class1 regardless of what value v has.