If I already know some word's POS information.
eg:I know st316(my id) is a Proper nouns (NR).In the sentence"I am st316." How can I make tagger use the Information that st316 is a NR,then decide the POS information of other words(I am).
Just like,
Input:I am st316/NR .
Output: I/PN am/VC st316/NR ./PU
Help me.Really thanks!
I can think of 2 options:
st316
must be tagged as X and Stanford failed to tag it as such, change the tag of st316
to X. The disadvantage of this approach is that the tagger is not able to use that information to better tag the rest of the sentence.If you go with option 2, you need to format your data as follows:
An_DT avocet_NN is_VBZ a_DT small_JJ ,_, cute_JJ bird_NN ._. I_PRP am_VBP st316_NNP ._. I_PRP am_VBP st316_NNP ._. I_PRP am_VBP st316_NNP ._. I_PRP am_VBP st316_NNP ._. I_PRP am_VBP st316_NNP ._.
The first line is taken from the Stanford FAQ. The rest is your extra knowledge. Note the one extra sentence is repeated. This is in order to add pseudo-counts to that observation. Informally, if you only included st316_NNP
once in the training data chances are the tagger will think it is noise/error and ignore it. Repeating is is like saying "Yes, I am sure, I know what I'm doing, learn from that data". Depending on how much data you have, you will need anywhere between 5 and 50 repetitions to ensure the tagger learns properly.