I'm trying to submit a Python script on AWS EMR that imports numpy but I get
ImportError: No module named numpy
I tried using one of the answers here: No module named numpy when spark-submitting. I created a bootstrap_actions.sh script that includes
sudo yum install python-numpy python-scipy -y
and I run the script when I create the cluster but still get the import error. Any solution on how can I get import numpy to work?
For Amazon EMR you need to use bootstrap actions. Installing from the console only changes the master node and not the task nodes.
runners:
emr:
bootstrap:
- sudo yum install -y python27-numpy
I am assuming that you will be using Python 2.7. If you are using Python 3.x, the link below has examples installing with PIP in the bootstrap. I am also assuming that you are using a recent EMR AMI.