Celery seems to be picking both my old code and new code. I have tried clearing cache, clearing the broker queue(redis), restarting celery etc. But none of them seem to be fixing this issue.
For context, we have new releases going to various servers periodically. The web application uses Django rest framework in the backend and celery for scheduling asynchronous tasks. Recently when we deployed new version of the code, the application was behaving very strangely. It had artifacts from the old code that was being run and parts of the new code as well. This was very weird behaviour until we found a github thread(https://github.com/celery/celery/issues/3519) which outlines the issue exactly as we had faced. There seems to be no good answers for this issue in that thread, so posting it here so that if anyone with celery knowledge knows of a workaround where we can stop celery from picking up the old artifacts.
The deployment is done through Jenkins build scripts. Please find below the script for the same. For obvious reasons, I have replaced our application name with "proj".
sudo /bin/systemctl stop httpd
sudo /bin/systemctl stop celery
/bin/redis-cli flushall
/srv/proj/bin/pip install --no-cache-dir --upgrade -r /srv/proj/requirements/staging.txt
/usr/bin/git fetch
/usr/bin/git fetch --tags
/usr/bin/git checkout $TAG
/srv/proj/bin/python manage.py migrate
sudo /bin/systemctl restart httpd
sudo /bin/systemctl restart proj
sudo /bin/systemctl start celery
OP, the problem is almost invariably that your servers include old code on them. There are two possible issues:
checkout
command fails. checkout
can fail to switch branches if there are local changes on a directory. In addition, the deployment script doesn't pull
the latest changes to the server. The more common approach that we use is to have the venv in another directory, and then on deploy clone
the repository to a new directory (versioned). Then switch the "main" app directory link to point to the latest version. The latter, for example, is the approach used in aws's elastic beanstalk.Ultimately, you won't be able to diagnose this problem without confirming that all of your servers are 100% identical. So, if you have devs or admins that ssh to these boxes, chances are that one or more of them made some change that affected (1) or (2).