Search code examples
pythondjangodatabasecelerycelerybeat

Celerybeat schedule executing task multiple times?


I have a task calculate_common_locations which runs once via CELERYBEAT_SCHEDULE. The task simply calls a function in the database:

@app.task
def calculate_common_locations():
    db.execute("SELECT * FROM calculate_centroids('b')")

This is the entry in CELERYBEAT_SCHEDULE:

CELERYBEAT_SCHEDULE = {
   'common_locations': {
        'task': 'clients.tasks.calculate_common_locations',
        'schedule': crontab(hour=23, day_of_week='sun'), #every week
    },
    [..]
}

The schedule includes more tasks that run once a day or every 10 seconds. These tasks seem to be not re-run many times. Celery flower shows the task is executed more than 20 times. The first one started as scheduled, runs ~100s, succeeds and then starts again.

enter image description here

There is only one celerybeat running:

ps -Af | grep celerybeat 
foo     24359   779  0 01:53 ?        00:00:04 [celeryd: celery@celery:MainProcess] -active- (worker --beat --app=cloud.celeryapp:app --concurrency=10 -l INFO -s /home/foo/run/celerybeat-schedule --pidfile=/home/foo/run/celerybeat.pid)         

This is how celery gets started (via supervisord):

celery worker --beat --app=cloud.celery app:app --concurrency=10 -l INFO -s /home/foo/run/celerybeat-schedule --pidfile=/home/foo/run/celerybeat.pid

I have tested it without the --concurrency=10 switch. The database function is still executed multiple times.

The function reads from a large table (> 1 Mil rows) that is inserted into quite often (a couple of times a second). Postgres locks show that all locks are granted.

Is it possible that the task is being re-run because the query terminates at some point?

There is no issues when:

  • the task is run from the django shell (directly or via .delay()),
  • the task's content is replaced by a lightweight sql query (select * from test),
  • the task's content is replaced by a sleep(100).

Versions:

  • celery==3.1.12
  • psql (PostgreSQL) 9.3.5

Solution

  • This may make more sense if you consider what crontab(hour=23, day_of_week='sun') does:

    >>> crontab(hour=23, day_of_week='sun')
    <crontab: * 23 sun * * (m/h/d/dM/MY)>
    

    So what this means is that the task will execute every minute at 11pm every sunday.

    If you want it to execute only at the first minute you can specify:

    crontab(minute=0, hour=23, day_of_week='sun')