I installed Pydoop
and am trying to run MapReduce
jobs. Just to do a dry run, I tried executing the word count examples wordcount_minimal.py
and wordcount_full.py
. Both of them hang at the map phase. In the end of the stderr
, I find this message as per the script I run:
module 'wordcount_minimal' has no attribute 'main'
or
module 'wordcount_full' has no attribute 'main'
I executed the job using the command:
pydoop submit --upload-file-to-cache wordcount_full.py wordcount_full hdfs_input_dir hdfs_output_dir
Unable to find the reason behind this. Any idea what could be the reason?
I was able to execute the example from the pydoop script
using the map
and reduce
functions and it completed successfully. But with the pydoop submit
option, I have this issue. Not sure if I am missing something.
PS: I have a cluster with 2 nodes running Hortonworks HDP 2.6.5
. Pydoop
is installed on both of them.
By default, pydoop submit expects an entry point called __main__
, but you can modify this via --entry-point
. For instance, if your code is:
class Mapper ...
class Reducer ...
def run():
pipes.run_task(pipes.Factory(Mapper, Reducer))
You can run it via pydoop submit --entry-point run ...