I am new to CRFs and some of my terminology might be skewed so bear with me. I'm assuming the Stanford NER implements a linear chain CRF.
Let x be a sequence of words and y the sequence of corresponding tags. Call x an example and y its label. A component x_i of x is a word. A component y_i of y is a tag.
When training the model we provide it with something like:
James PERSON
lives O
in O
Chicago LOCATION
. O
Coffee O
in O
Trieste LOCATION
is O
great O
. O
Does model use individual sentences as examples? Using the data above is one of the examples: < Coffee in Trieste is Great . >? Does this mean that a feature functions cannot depend on words in previous sentences?
If this is indeed the case, how does the model make sure that each example is indeed a sentence? Does it do any sentence boundary detection? Can it be made to look at e.g. batches of 4 sentences?
Thank you in advance :)
Two newlines are considered boundary of an example. Your examples can be anything from phrases to the whole documents. So for your example, if you want two sentences as two examples:
James PERSON lives O in O Chicago LOCATION . O Coffee O in O Trieste LOCATION is O great O . O