Search code examples
javapythonnlpnltkpos-tagger

ValueError: Could not find stanford-postagger.jar file for hazm library- python NLP


I want to run a code that need to stanford postagger.jar. but i have this error:

  File "/usr/lib/python2.7/site-packages/nltk/internals.py", line 562, in find_jar
    (name, path_to_jar))
ValueError: Could not find stanford-postagger.jar jar file at resources/stanford-postagger.jar

How i can fix this error?

EDIT: i use from hazm module:

from hazm import POSTagger
tagger = POSTagger()
tagger.tag(word_tokenize('ما بسیار کتاب می‌خوانیم'))

and full result:

Traceback (most recent call last):
  File "pyt.py", line 8, in <module>
    tagger = POSTagger()
  File "/home/vahid/dev/hazm/hazm/POSTagger.py", line 14, in __init__
    super(stanford.POSTagger, self).__init__(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/nltk/tag/stanford.py", line 42, in __init__
    verbose=verbose)
  File "/usr/lib/python2.7/site-packages/nltk/internals.py", line 562, in find_jar
    (name, path_to_jar))
ValueError: Could not find stanford-postagger.jar jar file at resources/stanford-postagger.jar

Solution

  • You will first need the postagger.jar file from stanford and also train your own tagger. BUT the hazm dev has kindly uploaded the resource directory that you will need here: http://dl.dropboxusercontent.com/u/90405495/resources.zip

    You will need to unzip and save the folder to the directory where you're running your script.

    For example:

    $ mkdir testdir
    $ wget https://github.com/sobhe/hazm/archive/master.zip
    $ unzip master.zip -d testdir
    $ cd testdir
    $ mv hazm-master/hazm/ .
    $ wget http://dl.dropboxusercontent.com/u/90405495/resources.zip
    $ unzip resources.zip -d .
    $ python
    Python 2.7.5+ (default, Sep 19 2013, 13:48:49) 
    [GCC 4.8.1] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import hazm
    >>> tagger = hazm.POSTagger()
    >>> tagger.tag(hazm.word_tokenize(u'ما بسیار کتاب می‌خوانیم'))
    [(u'\u0645\u0627', u'PR'), (u'\u0628\u0633\u06cc\u0627\u0631', u'ADV'), (u'\u06a9\u062a\u0627\u0628', u'N'), (u'\u0645\u06cc\u200c\u062e\u0648\u0627\u0646\u06cc\u0645', u'V')]