I found it inefficient to reboot the parser when new input comes, so I'd like to run the parser interactively--read the input from stdin and print result to stdout. However, the instruction given on the official website Can I have the parser run as a filter? seems not compatible with options (for example, -port
).
I know that CoreNLP can be run as a server but it can not receive POS tagged text as input so I won't use it.
Here is what I'm trying:
class myThread(threading.Thread):
def __init__(self,inQueue,outQueue):
threading.Thread.__init__(self)
self.cmd=['java.exe',
'-mx4g',
'-cp','*',
'edu.stanford.nlp.parser.lexparser.LexicalizedParser',
'-model', 'edu/stanford/nlp/models/lexparser/chinesePCFG.ser.gz',
'-sentences', 'newline',
'-outputFormat', 'conll2007',
'-tokenized',
'-tagSeparator','/',
'-tokenizerFactory', 'edu.stanford.nlp.process.WhitespaceTokenizer',
'-tokenizerMethod', 'newCoreLabelTokenizerFactory',
'-encoding', 'utf8']
self.subp=subprocess.Popen(cmd,stdin=subprocess.PIPE,stdout=subprocess.PIPE,stderr=subprocess.PIPE)
self.inQueue=inQueue
self.outQueue=outQueue
def run(self):
while True:
rid,sentence=self.inQueue.get()
print(u"Receive sentence %s"%sentence)
sentence=sentence.replace("\n","")
self.subp.stdin.write((sentence+u'\n').encode('utf8'))
self.subp.stdin.flush()
print("start readline")
result=self.subp.stdout.readline()
print("end readline")
print(result)
self.outQueue.put((rid,result))
I think you're confusing things a bit. Both CoreNLP and Stanford Parser have an option to run as a command-line filter, reading from stdin and writing to stdout. However, only CoreNLP separately provides a webservice implementation.
Options like port
only make sense for the latter.
So, at the moment, I agree that you have a valid use case (wanting to input pre-tagged text) but at present there isn't webservice support for it. The easiest path forward would be to write a simple webservice implementation for the parser. For us, it could happen sometime, but there are a bunch of other current priorities. Anyone else is welcome to write one. :)