Search code examples
marathondcos

taskKillGracePeriodSeconds is not working for DC/OS Marathon Application?


We have setup DC/OS(version 1.9) Cluster on AWS nodes. We are creating Marathon Application definition with setting "taskKillGracePeriodSeconds"=60. We are also catching SIGTERM in our application to handle the application shutdown gracefully. But this is is not wroking, Marathon is immediately killing the Application (on Scale Down / Destroy) and not waits for 60 secs as expected. We are getting callback on SIGTERM but application killed immediately after that. We have also tried with starting Mesos slave agents with setting following attributes in file /var/lib/dcos/mesos-slave-common MESOS_ATTRIBUTES=executor_shutdown_grace_period:60secs;docker_stop_timeout:60s ecs but this is also not helping.

DCOS Cluster Agents uses centos-release-7-2.1511.el7.centos.2.10.x86_64 OS.

Does anybody able to use taskKillGracePeriodSeconds successfully.?

Please help to work out this.

Thanks.


Solution

  • are you using Docker containers?

    There was a problem as far as I remember when using process groups (=containers) with the forwarding of the SIGTERM signal.

    Just to test this on your cluster, can you deploy an app with the following command, just using mesos containerizer and a taskKillGracePeriodSeconds of 10 seconds?

    trap "echo ' killing' && sleep 5 && echo 'test' && sleep 100" SIGTERM && sleep 100000