Search code examples
pythonnltkstanford-nlpnamed-entity-recognition

Python NLTK: Stanford NER tagger error message: NLTK was unable to find the java file


Trying to get Stanford NER working with Python. Followed some instructions on the web, but got the error message: "NLTK was unable to find the java file! Use software specific configuration paramaters or set the JAVAHOME environment variable." What was wrong? Thank you!

from nltk.tag.stanford import StanfordNERTagger
from nltk.tokenize import word_tokenize

model = r'C:\Stanford\NER\classifiers\english.muc.7class.distsim.crf.ser.gz'
jar = r'C:\Stanford\NER\stanford-ner-3.9.1.jar'

ner_tagger = StanfordNERTagger(model, jar, encoding = 'utf-8')

text = 'While in France, Christine Lagarde discussed short-term stimulus ' \
       'efforts in a recent interview with the Wall Street Journal.'

words = word_tokenize(text)
classified_words = ner_tagger.tag(words)

Solution

  • Found the solution on the web. Replace the path with your own.

     import os
    
     java_path = "C:/../../jdk1.8.0_101/bin/java.exe"   
     os.environ['JAVAHOME'] = java_path
    

    or:

    import nltk
    
    nltk.internals.config_java('C:/../../jdk1.8.0_101/bin/java.exe')
    

    Source: https://tianyouhu.wordpress.com/2016/09/01/problem-of-nltk-with-stanfordtokenizer/