Search code examples
dockercelerysupervisordfig

Restarting a docker that runs supervisord programs keeps pid files and causes error at restart


I have a docker that runs django celery worker via supervisord, the program setup is pretty simple

[program:celery_priority]
command=python manage.py celery worker -E -Q priority --concurrency=2 --loglevel=ERROR
directory=/var/lib/app
stdout_events_enabled = true
stderr_events_enabled = true
stopwaitsecs = 600

[program:celery_medium]
command=python manage.py celery worker -E -Q medium --concurrency=2 --loglevel=ERROR
directory=/var/lib/app
stdout_events_enabled = true
stderr_events_enabled = true
stopwaitsecs = 600

[program:celerycam]
command=python manage.py celerycam
directory=/var/lib/app
stdout_events_enabled = true
stderr_events_enabled = true
stopwaitsecs = 600

Our deployment cycle uses fig to manage dockers, here is how our fig.yml file looks like for the worker

worker:
  build: .docker/worker
  command: normal
  volumes_from:
    - appdata
  hostname: workerprod
  domainname: project.internal
  links:
    - redis
    - rabbit
    - appdata
    - mail

The problem that we are facing is that, when we try to use fig restart worker the supervisord program fails because it finds conflict in pid with following error

[130.211.XX.XX] out: worker_1     | celery_medium stderr | [2015-02-13 13:40:54,271: WARNING/MainProcess] ERROR: Pidfile (/tmp/med_celery.pid) already exists.
[130.211.XX.XX] out: worker_1     | Seems we're already running? (pid: 17)
[130.211.XX.XX] out: worker_1     | celery_priority stderr | [2015-02-13 13:40:54,272: WARNING/MainProcess] ERROR: Pidfile (/tmp/priority_celery.pid) already exists.
[130.211.XX.XX] out: worker_1     | Seems we're already running? (pid: 16)
[130.211.XX.XX] out: worker_1     | 2015-02-13 18:40:54,359 INFO exited: celery_medium (exit status 0; expected)
[130.211.XX.XX] out: worker_1     | 2015-02-13 18:40:54,359 INFO exited: celery_priority (exit status 0; expected)
[130.211.XX.XX] out: worker_1     | 2015-02-13 18:40:55,360 INFO success: celerycam entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)

Yet, when we use fig -d up worker it works, because apparently with up fig tried to re-create the container and not use existing one. But this causes all linked services to recreate too and hence lose RabbitMQ data and Redis cache.

Is there a way to restart the docker using simple fig restart worker and making sure that pid clears when restarting? please advise


Solution

  • Create an ENTRYPOINT script that cleans up any state data before running your CMD. E.g.

    FROM someotherimage
    COPY entrypoint.sh /entrypoint.sh
    ENTRYPOINT ["/entrypoint.sh"]
    

    And in entrypoint.sh:

    #!/bin/sh
    rm -f /tmp/*.pid
    exec "$@"
    

    The ENTRYPOINT script will run every time the container starts, and will ensure that any pid files in /tmp are cleared out before running the container command.