Search code examples
amazon-web-servicesamazon-ec2amazon-data-pipeline

AWS Data Pipeline - Task Runner does not stay active


I am attempting to run task runner on an Amazon Linux EC2 Instance (ami-6869aa05) for execution of AWS Datapipeline Shell Command tasks.

I have successfully created the pipeline, connected to the instance through Terminal via SSH, installed and started Task Runner and was able to run the pipeline successfully on the first activation. Subsequent pipeline jobs get stuck "waiting for runner". When looking into the dependencies of the pipeline, it shows that the worker group is not set up.

I used the following CLI entry to start task runner. ** s3 bucket and folder names removed **:

java -jar TaskRunner-1.0.jar --config ~/credentials.json --workerGroup=wg-01020 --region=us-east-1 --logUri=s3://**bucket-name**/**folder-name**

This results in the following output:

log4j:WARN No appenders could be found for logger (amazonaws.datapipeline.objects.PluginModule). log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Starting log pusher...
Log Pusher Started. Region: us-east-1, LogUri: s3://**bucket-name**/**folder-name**
Build info: commit=unknown, timestamp=2016-07-18 14:51:53 UTC
Initializing drivers...
Starting task runner...

The AWS Documentation for task runner says that "When Task Runner is active, it prints the path to where log files are written in the terminal window. The following is an example."...

Logging to /Computer_Name/.../output/logs

...but I have yet to see this print. This leads me to believe that exiting terminal will shut down task runner, causing the subsequent pipeline jobs to get stuck in "waiting for runner" status.

Any help would be much appreciated.


Solution

  • I was able to exit terminal without terminating task runner by adding & disownto the end of the command.

    java -jar TaskRunner-1.0.jar --config ~/credentials.json --workerGroup=wg-01020 --region=us-east-1 --logUri=s3://**bucket-name**/**folder-name** & disown
    

    This did not result in the Logging to /Computer_Name/.../output/logs output referenced above, but I don't have to leave a terminal window open and the data pipeline jobs have been successfully completing without issue.