Search code examples
djangoherokucelerycelerybeat

Celery Beat sending all periodic tasks as due on restart even if they have just run


I am running a celery worker and celery beat with DatabaseScheduler.

I am running these on Heroku along with a django web application. They are being used to send emails and texts at regular intervals, Once per day at midnight for emails and 730am for texts.

I have defined the tasks manually using a chrontab in Periodic Tasks table, but I have also tried defining them in the codebase using celeryapp.conf.beat_schedule, both behave the same.

The tasks look like this

periodic tasks in django admin

The issue I am having is that Heroku restarts its dynos once per day as a policy, and when this happens, the celery beat dyno runs my periodic tasks for some reason. The debug log reads "Scheduler: Sending due task" as if it is normal, whether they are due then or not. I do not know why this is happening.

I have celery results running as well with postgres backed task results and that looks like this, where I have the tasks running at the correct times (midnight and 730am), but also at some random time when heroku restarts the dynos.

How can I stop the tasks from running when this restart happens? Why are they running at all after the restart?

task results in django admin

EDIT: I was able to manually restart using heroku ps:restart beat and watch it happen, still don't know why but here is the debug log where it occurs. Keep in mind the tasks have already run successfully for this day and the last_run was correct prior to restarting. Also, restarting again DOES NOT run the tasks again. I wonder if somehow sometimes the restart doesn't see the last_run date? Still trying to figure this out.

2022-10-15T16:22:33.824655+00:00 app[beat.1]: [2022-10-15 12:22:33,824: DEBUG/MainProcess] beat: Waking up in 5.00 seconds.
2022-10-15T16:22:38.828379+00:00 app[beat.1]: [2022-10-15 12:22:38,828: DEBUG/MainProcess] beat: Waking up in 5.00 seconds.
2022-10-15T16:22:43.832789+00:00 app[beat.1]: [2022-10-15 12:22:43,832: DEBUG/MainProcess] beat: Waking up in 5.00 seconds.
2022-10-15T16:22:46.627485+00:00 heroku[beat.1]: Restarting
2022-10-15T16:22:46.630753+00:00 heroku[beat.1]: State changed from up to starting
2022-10-15T16:22:48.836424+00:00 app[beat.1]: [2022-10-15 12:22:48,836: DEBUG/MainProcess] beat: Waking up in 5.00 seconds.
2022-10-15T16:22:48.894489+00:00 heroku[beat.1]: Stopping all processes with SIGTERM
2022-10-15T16:22:48.956144+00:00 app[beat.1]: [2022-10-15 12:22:48,956: DEBUG/MainProcess] Writing entries...
2022-10-15T16:22:48.956272+00:00 app[beat.1]: [2022-10-15 12:22:48,956: DEBUG/MainProcess] Writing entries...
2022-10-15T16:22:48.956533+00:00 app[beat.1]: [2022-10-15 12:22:48,956: DEBUG/MainProcess] Writing entries...
2022-10-15T16:22:48.956585+00:00 app[beat.1]: [2022-10-15 12:22:48,956: DEBUG/MainProcess] Writing entries...
2022-10-15T16:22:49.225748+00:00 heroku[beat.1]: Process exited with status 0
2022-10-15T16:22:50.122842+00:00 heroku[beat.1]: Starting process with command `celery -A hello beat -l debug --scheduler django_celery_beat.schedulers:DatabaseScheduler`
2022-10-15T16:22:50.807227+00:00 heroku[beat.1]: State changed from starting to up
2022-10-15T16:22:51.834963+00:00 app[beat.1]: celery beat v5.2.7 (dawn-chorus) is starting.
2022-10-15T16:22:52.276824+00:00 app[beat.1]: __    -    ... __   -        _
2022-10-15T16:22:52.276844+00:00 app[beat.1]: LocalTime -> 2022-10-15 12:22:52
2022-10-15T16:22:52.276845+00:00 app[beat.1]: Configuration ->
2022-10-15T16:22:52.276845+00:00 app[beat.1]: . broker -> redis://:**@ec2-54-165-246-156.compute-1.amazonaws.com:31259//
2022-10-15T16:22:52.276846+00:00 app[beat.1]: . loader -> celery.loaders.app.AppLoader
2022-10-15T16:22:52.276846+00:00 app[beat.1]: . scheduler -> django_celery_beat.schedulers.DatabaseScheduler
2022-10-15T16:22:52.276846+00:00 app[beat.1]:
2022-10-15T16:22:52.276846+00:00 app[beat.1]: . logfile -> [stderr]@%DEBUG
2022-10-15T16:22:52.276851+00:00 app[beat.1]: . maxinterval -> 5.00 seconds (5s)
2022-10-15T16:22:52.277418+00:00 app[beat.1]: [2022-10-15 12:22:52,277: DEBUG/MainProcess] Setting default socket timeout to 30
2022-10-15T16:22:52.277510+00:00 app[beat.1]: [2022-10-15 12:22:52,277: INFO/MainProcess] beat: Starting...
2022-10-15T16:22:52.280410+00:00 app[beat.1]: [2022-10-15 12:22:52,280: DEBUG/MainProcess] DatabaseScheduler: initial read
2022-10-15T16:22:52.280449+00:00 app[beat.1]: [2022-10-15 12:22:52,280: DEBUG/MainProcess] Writing entries...
2022-10-15T16:22:52.280522+00:00 app[beat.1]: [2022-10-15 12:22:52,280: DEBUG/MainProcess] DatabaseScheduler: Fetching database schedule
2022-10-15T16:22:52.312792+00:00 app[beat.1]: [2022-10-15 12:22:52,312: DEBUG/MainProcess] Current schedule:
2022-10-15T16:22:52.312794+00:00 app[beat.1]: <ModelEntry: email-reminders hello.celery_tasks.email_reminders(*[], **{}) <crontab: 0 0
2022-10-15T16:22:52.312796+00:00 app[beat.1]: * *
2022-10-15T16:22:52.312797+00:00 app[beat.1]: * (m/h/d/dM/MY), America/New_York>
2022-10-15T16:22:52.312797+00:00 app[beat.1]: >
2022-10-15T16:22:52.312798+00:00 app[beat.1]: <ModelEntry: celery.backend_cleanup celery.backend_cleanup(*[], **{}) <crontab: 0 4
2022-10-15T16:22:52.312798+00:00 app[beat.1]: * *
2022-10-15T16:22:52.312798+00:00 app[beat.1]: * (m/h/d/dM/MY), America/New_York>
2022-10-15T16:22:52.312798+00:00 app[beat.1]: >
2022-10-15T16:22:52.312798+00:00 app[beat.1]: <ModelEntry: text-reminders hello.celery_tasks.text_reminders(*[], **{}) <crontab: 30 7
2022-10-15T16:22:52.312799+00:00 app[beat.1]: * *
2022-10-15T16:22:52.312799+00:00 app[beat.1]: * (m/h/d/dM/MY), America/New_York>
2022-10-15T16:22:52.312799+00:00 app[beat.1]: >
2022-10-15T16:22:52.328971+00:00 app[beat.1]: [2022-10-15 12:22:52,328: DEBUG/MainProcess] beat: Ticking with max interval->5.00 seconds
2022-10-15T16:22:52.336205+00:00 app[beat.1]: [2022-10-15 12:22:52,336: INFO/MainProcess] Scheduler: Sending due task email-reminders (hello.celery_tasks.email_reminders)
2022-10-15T16:22:52.363919+00:00 app[beat.1]: [2022-10-15 12:22:52,363: DEBUG/MainProcess] beat: Synchronizing schedule...
2022-10-15T16:22:52.363924+00:00 app[beat.1]: [2022-10-15 12:22:52,363: DEBUG/MainProcess] Writing entries...
2022-10-15T16:22:52.365247+00:00 app[worker.1]: [2022-10-15 12:22:52,365: INFO/MainProcess] Task hello.celery_tasks.email_reminders[9f9c958f-b62d-4078-ac87-9b76d255f427] received
2022-10-15T16:22:52.371464+00:00 app[beat.1]: [2022-10-15 12:22:52,371: DEBUG/MainProcess] hello.celery_tasks.email_reminders sent. id->9f9c958f-b62d-4078-ac87-9b76d255f427
2022-10-15T16:22:52.373084+00:00 app[beat.1]: [2022-10-15 12:22:52,373: INFO/MainProcess] Scheduler: Sending due task celery.backend_cleanup (celery.backend_cleanup)
2022-10-15T16:22:52.374784+00:00 app[beat.1]: [2022-10-15 12:22:52,374: DEBUG/MainProcess] celery.backend_cleanup sent. id->5b2b1962-af5e-4ec8-8b64-2078beabeaa3
2022-10-15T16:22:52.375157+00:00 app[worker.1]: [2022-10-15 12:22:52,374: INFO/MainProcess] Task celery.backend_cleanup[5b2b1962-af5e-4ec8-8b64-2078beabeaa3] received
2022-10-15T16:22:52.376237+00:00 app[beat.1]: [2022-10-15 12:22:52,376: INFO/MainProcess] Scheduler: Sending due task text-reminders (hello.celery_tasks.text_reminders)
2022-10-15T16:22:52.377725+00:00 app[beat.1]: [2022-10-15 12:22:52,377: DEBUG/MainProcess] hello.celery_tasks.text_reminders sent. id->d2f0cce8-e036-4d21-880e-435f2ea514a5
2022-10-15T16:22:52.377945+00:00 app[worker.1]: [2022-10-15 12:22:52,377: INFO/MainProcess] Task hello.celery_tasks.text_reminders[d2f0cce8-e036-4d21-880e-435f2ea514a5] received
2022-10-15T16:22:52.378998+00:00 app[beat.1]: [2022-10-15 12:22:52,378: DEBUG/MainProcess] beat: Waking up in 5.00 seconds.
2022-10-15T16:22:52.393541+00:00 app[worker.1]: [2022-10-15 12:22:52,393: INFO/ForkPoolWorker-8] Task hello.celery_tasks.email_reminders[9f9c958f-b62d-4078-ac87-9b76d255f427] succeeded in 0.02742149494588375s: None
2022-10-15T16:22:52.413622+00:00 app[worker.1]: [2022-10-15 12:22:52,413: INFO/ForkPoolWorker-2] Task hello.celery_tasks.text_reminders[d2f0cce8-e036-4d21-880e-435f2ea514a5] succeeded in 0.03441403107717633s: None
2022-10-15T16:22:52.421015+00:00 app[worker.1]: [2022-10-15 12:22:52,420: INFO/ForkPoolWorker-1] Task celery.backend_cleanup[5b2b1962-af5e-4ec8-8b64-2078beabeaa3] succeeded in 0.04107004823163152s: None
2022-10-15T16:22:57.385878+00:00 app[beat.1]: [2022-10-15 12:22:57,385: DEBUG/MainProcess] beat: Waking up in 5.00 seconds.
2022-10-15T16:23:02.391952+00:00 app[beat.1]: [2022-10-15 12:23:02,391: DEBUG/MainProcess] beat: Waking up in 5.00 seconds.

EDIT2:

Tried manually clearing the last_run with

from django_celery_beat.models import PeriodicTask, PeriodicTasks
PeriodicTask.objects.update(last_run_at=None)
PeriodicTasks.changed()

and restarting the dynos, but I can't seem to replicate it. However it has happened every single heroku dyno restart so far.

EDIT3:

I connected a debugger to celery beat and painstakingly stepped through the tick code, watching how it initializes and begins using periodic tasks. I could not yet find a reason for my issue yet (especially because I couldn't see it actually happen, just looking thru the code hoping to find the answer). Everything makes sense with the startup procedure. I fear it will end up being something relating to my timezones or a deeper part of the calculation function which determines if the task is due. I will keep looking and update if I resolve it.

EDIT4:

I stepped through everything a bunch of times, still didn't find a cause. I see at the root, a task is determined due if its last_run_at value + a calculated delta of when it is next supposed to run - now is zero or negative, then it will run. So I have to assume something is wrong with the last_run_at date, or during this calculation, on beat restart during the populate_heap() function. For now I have had to implement a band aid to ignore repeated calls to my tasks outside of their scheduled times until I can fully resolve this.


Solution

  • While I didn't verify the exact cause, I think I found the solution as it didn't happen again today, will update if its not fixed though. If beat doesn't see last_run_at it uses datetime.now(). I believe somehow the calculation of (last run + delta - now) was coming back negative, causing it to run at the start.

    I noticed a setting inside the code of beat called "DJANGO_CELERY_BEAT_TZ_AWARE" that I did not have set. I also think there was a mistake in my settings with USE_TZ false as well.

    I have celery settings mapped to "CELERY_", and this was my before and after in settings.py:

    Before:

    CELERY_ENABLE_UTC = False
    CELERY_TIMEZONE = "America/New_York"
    TIME_ZONE = "America/New_York"
    USE_TZ = False 
    

    After (No longer ocurring):

    CELERY_ENABLE_UTC = False
    CELERY_TIMEZONE = "America/New_York"
    DJANGO_CELERY_BEAT_TZ_AWARE = True
    TIME_ZONE = "America/New_York"
    USE_TZ = True