Linear Chain Conditional Random Field Sequence Model - NER

I am confused with what a linear chain CRF implementation exactly is. While some people say that "The Linear Chain CRF restricts the features to depend on only the current(i) and previous label(i-1), rather than arbitrary labels throughout the sentence" , some people say that it restricts the features to depend on the current(i) and future label(i+1).

I am trying to understand the implementation that goes behind the Stanford NER Model. Can someone please explain what exactly the linear chain CRF Model is?

Solution

Both models would be linear chain CRF models. The important part about the "linear chain" is that the features depend only on the current label and one direct neighbour in the sequence. Usually this would be the previous label (because that corresponds with reading order), but it could also be the future label. Such a model model would basically process the sentence backwards, and I have never seen this in the literature, but it would still be a linear chain CRF).

As far as I know, the Stanford NER model is based on a model that uses the current and the previous label, but it also uses an extension that can also look to labels further back. It is therefore not a strict linear-chain model, but uses an extension described in this paper:

Jenny Rose Finkel, Trond Grenager, and Christopher Manning. 2005. Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling. Proceedings of the 43nd Annual Meeting of the Association for Computational Linguistics (ACL 2005), pp. 363-370. http://nlp.stanford.edu/~manning/papers/gibbscrf3.pdf