Search code examples
amazon-web-servicescronemramazon-emr

Run cron task on AWS EMR master node


How can i run periodic job in background on EMR cluster? I have script.sh with cron job and application.py in s3 and want to run cluster with this command:

aws emr create-cluster 
--name "Test cluster"
–-release-label emr-5.12.0 
--applications Name=Hive Name=Pig Name=Ganglia Name=Spark
--use-default-roles 
--ec2-attributes KeyName=myKey 
--instance-type m3.xlarge 
--instance-count 3 
--steps Type=CUSTOM_JAR,Name=CustomJAR,ActionOnFailure=CONTINUE,
Jar=s3://region.elasticmapreduce/libs/script-runner/script-runner.jar,
Args["s3://mybucket/script-path/script.sh"]

Finally, i want that cron job from script.sh execute application.py Now i don't understand how to install cron on master node, python file need some libraries, they should be installed to.


Solution

  • You need to SSH into the master node, and then perform the crontab setup from there, not on your local machine:

    https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-connect-master-node-ssh.html

    Connect to the Master Node Using SSH

    Secure Shell (SSH) is a network protocol you can use to create a secure connection to a remote computer. After you make a connection, the terminal on your local computer behaves as if it is running on the remote computer. Commands you issue locally run on the remote computer, and the command output from the remote computer appears in your terminal window.

    When you use SSH with AWS, you are connecting to an EC2 instance, which is a virtual server running in the cloud. When working with Amazon EMR, the most common use of SSH is to connect to the EC2 instance that is acting as the master node of the cluster.

    Using SSH to connect to the master node gives you the ability to monitor and interact with the cluster. You can issue Linux commands on the master node, run applications such as Hive and Pig interactively, browse directories, read log files, and so on. You can also create a tunnel in your SSH connection to view the web interfaces hosted on the master node. For more information, see View Web Interfaces Hosted on Amazon EMR Clusters.