Search code examples
pythonparsinginstallationpart-of-speech

Invalid parameter file in Treetaggerwrapper python


I have installed TreeTaggerwrapper for python through PyPI. I have placed the treetaggerwrapper.py and treetaggerpoll.py in the Treetagger directory. I have also placed the english.par file in the 'lib' sub directory. When I initiate the tagger object I get the below error.

tagger= treetaggerwrapper.TreeTagger(TAGLANG='en')
----> 1 tagger= treetaggerwrapper.TreeTagger(TAGLANG='en')

c:\users\kj\appdata\local\programs\python\python36\lib\site-packages\treetaggerwrapper.py in __init__(self, **kargs)
   1000         logger.debug("Using treetaggerwrapper.py from %s", osp.abspath(__file__))
   1001         self._set_language(kargs)
-> 1002         self._set_tagger(kargs)
   1003         self._set_preprocessor(kargs)
   1004         # Note: TreeTagger process is started later, when really needed.

c:\users\kj\appdata\local\programs\python\python36\lib\site-packages\treetaggerwrapper.py in _set_tagger(self, kargs)
   1087                              self.tagparfile)
   1088                 raise TreeTaggerError("TreeTagger parameter file invalid: " + \
-> 1089                                       self.tagparfile)
   1090         logger.info("tagparfile=%s", self.tagparfile)
   1091 

TreeTaggerError: TreeTagger parameter file invalid: english-utf8.par

When I access tree tagger from cmd using "tag-english", I am getting the output.Can someone point what is wrong here? I have added treetagger directory to the PATH.'


Solution

  • I had the same problem a little while ago. Apparently treetaggerwrapper expects the parameter files to have a name like english-utf8.par. But if you just extracted the parameter files downloaded from TreeTagger and didn't modify them later, it's likely that they are called something like english.par.

    You can either change the expected file name in the code or change the name of your parameter file to match the code. (parameter files should already be encoded in utf8 so you don't need to change the encoding, just the name)