StandPOSTagger in Python "Could not find or load main class"

I recently try to learn nltk package through http://textminingonline.com/dive-into-nltk-part-v-using-stanford-text-analysis-tools-in-python. But I faced a question about performing JAVA code in Python:

import os
java_path = "C:\Program Files (x86)\Java\jre1.8.0_121\\bin\java.exe"
os.environ['JAVAHOME'] = java_path
os.environ['JAVAHOME']

It turned out:

'C:\\Program Files (x86)\\Java\\jre1.8.0_121\\bin\\java.exe'

Then I run nltk code:

import nltk
from nltk.tag.stanford import StanfordPOSTagger
english_postagger=StanfordPOSTagger('models/english-bidirectional-distsim.tagger','stanford-postagger.jar')
english_postagger.tag('hi')

However:

`Error: Could not find or load main class`edu.stanford.nlp.tagger.maxent.MaxentTagger

I reviewed the documents in 'stanford-postagger.jar', the MaxentTagger file was there: path to Maxent Tagger

May I know how I could set right class path? or other way to solve this problem. P.S. : I don't have experience in Java, but Python.

Solution

The issue is you don't have access to the jars, so this is a CLASSPATH issue. I'm not positive this will work with nltk, but I've seen previous answers where setting os.environ["CLASSPATH"]= "/path/to/stanford-corenlp-full-2016-10-31" solves this.

You can download Stanford CoreNLP 3.7.0 from here:

http://stanfordnlp.github.io/CoreNLP/download.html

If you want to use our tools in Python, I would recommend using the Stanford CoreNLP 3.7.0 server and making small server requests (or using the stanza library).

If you use nltk what I believe happens is Python just calls our Java code with subprocess and this can actually be very inefficient since distinct calls reload all of the models.

Here is a previous answer I gave which describes this more thoroughly:

cannot use pycorenlp for python3.5 through terminal