Search code examples
pythonhadoophadoop-streamingpytz

Error importing pytz module during hadoop streaming process


I am running a python script on hadoop and It gives the following error

ImportError: No module named pytz

When I run the python script on terminal, it executes perfectly. Ideally it should not happen because hadoop uses the same python version and libraries which the system does. Any idea ?


Solution

  • If you are using any python package in stream job, it need to be installed on each individual node of the cluster. Other option is to zip the package in a tarball and send it along with -file option. Refer to this answer for more details - How can I include a python package with Hadoop streaming job?