I am running a python script on hadoop and It gives the following error
ImportError: No module named pytz
When I run the python script on terminal, it executes perfectly. Ideally it should not happen because hadoop uses the same python version and libraries which the system does. Any idea ?
If you are using any python package in stream job, it need to be installed on each individual node of the cluster. Other option is to zip the package in a tarball and send it along with -file
option. Refer to this answer for more details - How can I include a python package with Hadoop streaming job?