I recently try to learn nltk package through http://textminingonline.com/dive-into-nltk-part-v-using-stanford-text-analysis-tools-in-python. But I faced a question about performing JAVA code in Python:
import os
java_path = "C:\Program Files (x86)\Java\jre1.8.0_121\\bin\java.exe"
os.environ['JAVAHOME'] = java_path
os.environ['JAVAHOME']
It turned out:
'C:\\Program Files (x86)\\Java\\jre1.8.0_121\\bin\\java.exe'
Then I run nltk code:
import nltk
from nltk.tag.stanford import StanfordPOSTagger
english_postagger=StanfordPOSTagger('models/english-bidirectional-distsim.tagger','stanford-postagger.jar')
english_postagger.tag('hi')
However:
`Error: Could not find or load main class`edu.stanford.nlp.tagger.maxent.MaxentTagger
I reviewed the documents in 'stanford-postagger.jar', the MaxentTagger file was there: path to Maxent Tagger
May I know how I could set right class path? or other way to solve this problem. P.S. : I don't have experience in Java, but Python.
The issue is you don't have access to the jars, so this is a CLASSPATH issue. I'm not positive this will work with nltk
, but I've seen previous answers where setting os.environ["CLASSPATH"]= "/path/to/stanford-corenlp-full-2016-10-31"
solves this.
You can download Stanford CoreNLP 3.7.0 from here:
http://stanfordnlp.github.io/CoreNLP/download.html
If you want to use our tools in Python, I would recommend using the Stanford CoreNLP 3.7.0 server and making small server requests (or using the stanza
library).
If you use nltk
what I believe happens is Python just calls our Java code with subprocess
and this can actually be very inefficient since distinct calls reload all of the models.
Here is a previous answer I gave which describes this more thoroughly: