How are Words converted to Vectors in Stanford NER

I am looking at Stanford NER and want to know how the words are represented. Are they converted to vectors using Word2Vec or Glove when training the model using linear CRF.

A Little more study shows me that the data is stored into a CRFDatum structure. Can anyone please elaborate on this?

Solution

Well, now I know how the old-school AI people feel...

Back in the Old Days (including when the NER system was built), before neural networks took off, statistical ML converted discrete outputs into vectors using custom-built featurizers. For language, this usually resulted in a very long but sparse vector of one-hot features. For example, a featurizer might assign each word a one-hot representation: 1 at the index corresponding to the word, and zero elsewhere. For NER, these features were usually things like the characters in the word (one-hot encoded), prefixes and suffixes of length $k$, word shape, part-of-speech tag, etc.

In Stanford's code, these sparse vectors are usually represented as Counter objects of one form or another, which then get passed into a Datum object and converted into a more densely packed Dataset object, which is fed into the optimizer (usually, QNMinimizer, implementing L-BFGS).