I have been struggling with this for 2 hours now!
I created a mapper script in python which is importing one of my custom functions in other python script.
#!/usr/bin/env python
import sys
import testImport
for line in sys.stdin:
if line and line!='':
words = line.strip().lower().split('\t')
print '%s\t%s' % (words[0].strip(),testImport.age_classify(int(words[1])))
This code works well on my terminal....the problem is when i upload this mapper function to AWS Elastic MapReduce. My job fails with error saying "Failed to import module testImport".
testImport is a file 'testImport.py' which contains some of my helper functions (like the age_classify function), which i need to operate on each line of standard input.
I uploaded the script in the same bucket as my mapper script(the given script).
I tried to pass it in the arguments section when i add 'Streaming program' step. I have no clue what to do even after seeing all the related questions.
How can i get this done???
Any help would be really great!
Thank you!
AS you have said i have uploaded testImport.py in same bucket as that of map/reduce script. EMR can not read from that bucket unless you specify.
For java , we created on fatjar for all related classes and create single jar file and execute it. for your python script , try to create single map script and reducer script and run it.