Trying to get Stanford NER working with Python. Followed some instructions on the web, but got the error message: "NLTK was unable to find the java file! Use software specific configuration paramaters or set the JAVAHOME environment variable." What was wrong? Thank you!
from nltk.tag.stanford import StanfordNERTagger
from nltk.tokenize import word_tokenize
model = r'C:\Stanford\NER\classifiers\english.muc.7class.distsim.crf.ser.gz'
jar = r'C:\Stanford\NER\stanford-ner-3.9.1.jar'
ner_tagger = StanfordNERTagger(model, jar, encoding = 'utf-8')
text = 'While in France, Christine Lagarde discussed short-term stimulus ' \
'efforts in a recent interview with the Wall Street Journal.'
words = word_tokenize(text)
classified_words = ner_tagger.tag(words)
Found the solution on the web. Replace the path with your own.
import os java_path = "C:/../../jdk1.8.0_101/bin/java.exe" os.environ['JAVAHOME'] = java_path
or:
import nltk nltk.internals.config_java('C:/../../jdk1.8.0_101/bin/java.exe')
Source: https://tianyouhu.wordpress.com/2016/09/01/problem-of-nltk-with-stanfordtokenizer/