python hadoop mapreduce hortonworks-data-platform

Pydoop mapreduce "AttributeError: module 'wordcount_minimal' has no attribute 'main'"

I installed Pydoop and am trying to run MapReduce jobs. Just to do a dry run, I tried executing the word count examples wordcount_minimal.py and wordcount_full.py. Both of them hang at the map phase. In the end of the stderr, I find this message as per the script I run:

module 'wordcount_minimal' has no attribute 'main'

module 'wordcount_full' has no attribute 'main'

I executed the job using the command:

pydoop submit --upload-file-to-cache wordcount_full.py wordcount_full hdfs_input_dir hdfs_output_dir

Unable to find the reason behind this. Any idea what could be the reason?

I was able to execute the example from the pydoop script using the map and reduce functions and it completed successfully. But with the pydoop submit option, I have this issue. Not sure if I am missing something.

PS: I have a cluster with 2 nodes running Hortonworks HDP 2.6.5. Pydoop is installed on both of them.

Solution

By default, pydoop submit expects an entry point called __main__, but you can modify this via --entry-point. For instance, if your code is:

class Mapper ...
class Reducer ...
def run():
    pipes.run_task(pipes.Factory(Mapper, Reducer))

You can run it via pydoop submit --entry-point run ...

Pydoop mapreduce "AttributeError: module 'wordcount_minimal' has no attribute '__main__'"

Pydoop mapreduce "AttributeError: module 'wordcount_minimal' has no attribute 'main'"