I have a task calculate_common_locations which runs once via CELERYBEAT_SCHEDULE
.
The task simply calls a function in the database:
@app.task
def calculate_common_locations():
db.execute("SELECT * FROM calculate_centroids('b')")
This is the entry in CELERYBEAT_SCHEDULE
:
CELERYBEAT_SCHEDULE = {
'common_locations': {
'task': 'clients.tasks.calculate_common_locations',
'schedule': crontab(hour=23, day_of_week='sun'), #every week
},
[..]
}
The schedule includes more tasks that run once a day or every 10 seconds. These tasks seem to be not re-run many times. Celery flower shows the task is executed more than 20 times. The first one started as scheduled, runs ~100s, succeeds and then starts again.
There is only one celerybeat running:
ps -Af | grep celerybeat
foo 24359 779 0 01:53 ? 00:00:04 [celeryd: celery@celery:MainProcess] -active- (worker --beat --app=cloud.celeryapp:app --concurrency=10 -l INFO -s /home/foo/run/celerybeat-schedule --pidfile=/home/foo/run/celerybeat.pid)
This is how celery gets started (via supervisord):
celery worker --beat --app=cloud.celery app:app --concurrency=10 -l INFO -s /home/foo/run/celerybeat-schedule --pidfile=/home/foo/run/celerybeat.pid
I have tested it without the --concurrency=10 switch. The database function is still executed multiple times.
The function reads from a large table (> 1 Mil rows) that is inserted into quite often (a couple of times a second). Postgres locks show that all locks are granted.
Is it possible that the task is being re-run because the query terminates at some point?
There is no issues when:
.delay()
),Versions:
This may make more sense if you consider what crontab(hour=23, day_of_week='sun')
does:
>>> crontab(hour=23, day_of_week='sun')
<crontab: * 23 sun * * (m/h/d/dM/MY)>
So what this means is that the task will execute every minute at 11pm every sunday.
If you want it to execute only at the first minute you can specify:
crontab(minute=0, hour=23, day_of_week='sun')