Search code examples
stanford-nlp

Can I choose a pos.model in Stanford parser?


I want to use gate-EN-twitter.model for pos tagging when in the process of parsing by Stanford parser. Is there an option on command line that does that? like -pos.model gate-EN-twitter.model? Or do I have to use Stanford pos tagger with gate model for tagging first then use its output as input for the parser?

Thanks!


Solution

  • If I understand you correctly, you want to force the Stanford Parser to use the tags generated by this Twitter-specific POS tagger. That's definitely possible, though this tweet from Stanford NLP about this exact model should serve as a warning:

    Tweet from Stanford NLP, 13 Apr 2014:

    Using CoreNLP on social media? Try GATE Twitter model (iff not parsing…) -pos.model gate-EN-twitter.model https://gate.ac.uk/wiki/twitter-postagger.html #nlproc

    (https://twitter.com/stanfordnlp/status/455409761492549632)

    That being said, if you really want to try, we can't stop you :)

    There is a parser FAQ entry on forcing in your own tags. See http://nlp.stanford.edu/software/parser-faq.shtml#f

    Basically, you have two options (see the FAQ for full details):

    • If calling the parser from the command line, you can pre-tag your text file and then alert the parser to the fact that the text is pre-tagged using some command-line options.
    • If parsing programmatically, the LexicalizedParser#parse method will accept any List<? extends HasTag> and treat the tags in that list as golden. Just pre-tag your list (using the CoreNLP pipeline or MaxentTagger) and pass on that token list to the parser.