Search code examples
nlpmallet

How to add word-level features to Mallet SimpleTagger?


I have been going through this blog post which contains a SimpleTagger example.

It says:

Given an input file "sample" as follows:

CAPITAL Bill  noun
        slept non-noun
        here non-noun
where all but the last token on each line is a binary feature, and the last token on the line is the label name

So, how do I add the word-level features here?

Example: The number of syllables in the word, the length of the word, etc


Solution

  • Everything before the last token is treated as a feature. You should be able to add arbitrary features before this:

    CAP SYL1 CHAR4 Bill noun
    SYL3 CHAR9 responded non-noun
    ...