Search code examples
pythonhadoopmapreducehortonworks-data-platform

Pydoop mapreduce "AttributeError: module 'wordcount_minimal' has no attribute '__main__'"


I installed Pydoop and am trying to run MapReduce jobs. Just to do a dry run, I tried executing the word count examples wordcount_minimal.py and wordcount_full.py. Both of them hang at the map phase. In the end of the stderr, I find this message as per the script I run:

module 'wordcount_minimal' has no attribute 'main'

or

module 'wordcount_full' has no attribute 'main'

I executed the job using the command:

pydoop submit --upload-file-to-cache wordcount_full.py wordcount_full hdfs_input_dir hdfs_output_dir

Unable to find the reason behind this. Any idea what could be the reason?

I was able to execute the example from the pydoop script using the map and reduce functions and it completed successfully. But with the pydoop submit option, I have this issue. Not sure if I am missing something.

PS: I have a cluster with 2 nodes running Hortonworks HDP 2.6.5. Pydoop is installed on both of them.


Solution

  • By default, pydoop submit expects an entry point called __main__, but you can modify this via --entry-point. For instance, if your code is:

    class Mapper ...
    class Reducer ...
    def run():
        pipes.run_task(pipes.Factory(Mapper, Reducer))
    

    You can run it via pydoop submit --entry-point run ...