Search code examples
pythondjangocron

Django cron runs multiple time, but it shouldn't


I have multiple crons set in Django. In each CronJob I have set ALLOW_PARALLEL_RUNS = False. To run crons I have used linux crontab like follows :

*/1 * * * * /home/social/centralsystem/venv/bin/python3.6 /home/social/centralsystem/manage.py runcrons 

After some times of running (for example after 2 monthes) I see lots of same crons running that make a lot of load on the server. My question is that what causes this happen?

one example of my cron classes is :

class UserTaskingCronJob(CronJobBase):
    ALLOW_PARALLEL_RUNS = False
    RUN_EVERY_MINS = 5

    schedule = Schedule(run_every_mins=RUN_EVERY_MINS)
    code = 'user_tasking'

    def do(self):
        args = {
            'telegram': {
                'need_recrawl_threshold': 60 * 2,
                'count': 100,
            },
            'newsAgency': {
                'need_recrawl_threshold': 10,
                'count': 100,
            },
            'twitter': {
                'need_recrawl_threshold': 60 * 4,
                'count': 500
            },
        }
        for social_network in ['telegram', 'newsAgency', 'twitter']:
            user_queuing(
                SOCIAL_USERS_MODEL[social_network],
                social_network,
                args[social_network]['need_recrawl_threshold'],
                args[social_network]['count'],
            )

Solution

  • You have to be careful with django-cron, if you have lots of different tasks running for different periods of time. runcrons takes all your cron classes sequentially and runs them sequentially. It also only logs a cron (successful or not) to the database when it's done. I think django-cron could be improved by saving the cron log at the start already (and checking if there is already a running task), but that would still not exclude overlaps if multiple jobs are run rather than one long one.

    You are running runcrons every minute, so in these cases you'll run into trouble:

    • If during one of the runs, one of the tasks that needs to be run takes longer than 1 minute to run.
    • If during one of the runs, the total duration of all tasks that need to be run takes longer than 1 minute to run.

    In both cases, some tasks will not be logged in time to the database and while they are running, the next runcrons command will start them again.

    To avoid this, do the following:

    • Identify tasks that take longer than 1 minute to run and run them with a different schedule that ensures they have finished before the next run.
    • In the crontab, run separate runcrons commands with a list of cron classes each, making sure that the total run of a list lasts less than 1 minute, e.g.
    */1 * * * * ./bin/python3.6 manage.py runcrons "my_app.crons.FirstCron" "my_app.crons.SecondCron"
    */1 * * * * ./bin/python3.6 manage.py runcrons "my_app.crons.ThirdCron"
    */10 * * * * ./bin/python3.6 manage.py runcrons "my_app.crons.LongCron"