How is feature is different from tag in Stanford-NER?

http://nlp.stanford.edu/software/CRF-NER.shtml's FAQ tells we can include our customized feature while training. On the first place, what features do in NER? How it is different from tag in tsv training file? As asked in this question Stanford-NER customization to classify software programming keywords, is it right to represent the tags 'Programming_Language', 'Operating_System' in feature column in tsv?

Bit confusing, pls explain.

Solution

The tag is the label you want to apply to the token. For instance O, PERSON, LOCATION, ORGANIZATION, PROGRAMMING_LANGUAGE. O means not an entity.

A feature is an aspect of the token stream you want the CRF Classifier to use in its decision.

Consider the sentence "I went to France last summer."

The tags would be [O O O LOCATION O O O].

For instance a feature could be the word itself, "word=France".

A feature could be the last two words before the current word in the sequence "word_n-2_n-1=went to".

Or a feature could be something like the shape of the word "shape=Xxxxxx"

The point of the features is that the CRF Classifier can find patterns, for instance that words with particular shapes tend to be O, or that particular words tend to belong to particular classes.

You do not need custom features if you simply want to add new categories such as PROGRAMMING_LANGUAGE or OPERATING_SYSTEM. You just need training data so the system can learn how to label tokens appropriately.