I used to use cron for my backup routine and everything was fine:
tar c --exclude=owncloud --exclude=hadoop -C /var/opt . | pigz -c -p 4 --best
| hadoop fs -put - /apps/appBackups/myserver_var_opt_$(date +\%Y-\%m-\%d_\%H-\%M-\%S).tar.gz
When I moved it to Mesos Chronos, it started failing from time to time even if I force run it:
ssh root@myserver <<'ENDSSH' bash daily_opt_backup.sh ENDSSH
mesos-master.INFO logs are not descriptive enough - they show a state of a task (TASK_RUNNING, ACKNOWLEDGE call, TASK_FINISHED, and UUIDs) but not the reason why the task failed. Where could I find this information?
Job fails as some slaves does not have private keys to log in as root. The proper way is put a script to HDFS so every mesos-slave could copy and run it:
hadoop fs -get /apps/utils/daily_opt_backup.sh && chmod +x daily_opt_backup.sh
&& ./daily_opt_backup.sh