Search code examples
pythonnltkexepyinstaller

Using Pyinstaller with NLTK results in error: can't find nltk_data


I am attempting to export a simple GUI that used NLTK as an exe with Python 3.6 and Windows 10.

When I run PyInstaller to freeze my simple program as an exe I get the error: Unable to find "c:\users\usr\nltk_data" when adding binary and data files.

When I even copied the nltk_data folder here and I get an error in a different nltk.data.path path "c:\users\usr\appdata\local\programs\python\python36\nltk_data"

import tkinter as tk
from nltk.corpus import stopwords
sw = stopwords.words('english')

counter = 0 
def counter_label(label):
  counter = 0
  def count():
    global counter
    counter += 1
    label.config(text=sw[counter])
    label.after(1000, count)
  count()


root = tk.Tk()
root.title("Counting Seconds")
label = tk.Label(root, fg="dark green")
label.pack()
counter_label(label)
button = tk.Button(root, text='Stop', width=25, command=root.destroy)
button.pack()
root.mainloop()

for pyinstaller I run

pyinstaller --onefile -- windowed test_tkinter.py

Solution

  • It seems that it is a known bug to the hook of PyInstaller named nltk. An easy way to fix it is to edit this file:

    <PythonPath>/Lib/site-packages/PyInstaller/hooks/hook-nltk.py
    

    And comment the lines iterating over nltk_data:

    #-----------------------------------------------------------------------------
    # Copyright (c) 2005-2018, PyInstaller Development Team.
    #
    # Distributed under the terms of the GNU General Public License with exception
    # for distributing bootloader.
    #
    # The full license is in the file COPYING.txt, distributed with this software.
    #-----------------------------------------------------------------------------
    
    
    # hook for nltk
    import nltk
    from PyInstaller.utils.hooks import collect_data_files
    
    # add datas for nltk
    datas = collect_data_files('nltk', False)
    
    # loop through the data directories and add them
    # for p in nltk.data.path:
    #     datas.append((p, "nltk_data"))
    
    datas.append(("<path_to_nltk_data>", "nltk_data"))
    
    # nltk.chunk.named_entity should be included
    hiddenimports = ["nltk.chunk.named_entity"]
    

    Remember to replace path_to_nltk_data with your currrent path for nltk_data.