Search code examples
cronjobs

What is the best way to run a scheduled job


I have a project that contains two parts: the first one is a Flask Api and the second one is a script that should be scheduled. The Flask app is served through a docker image that runs in Openshift.

My problem is where should i schedule the second script. I have access to Gitlab CI/CD but that's not really its purpose. Building a docker image and running it on Openshift is also not possible because it will run more times than needed if the pods are more than 1. The only option I'm thinking of is just using a regular server with cron.

Do you have maybe a better solution?

Thanks


Solution

  • There are several aspects to your question and several ways to do it, I'll give you some brief info on where to start.

    Pythonic-way

    You can deploy a celery worker, that will handle the scheduled jobs. You can look into celery documentation on how to work it out in python: https://docs.celeryproject.org/en/latest/userguide/workers.html

    You can probably get a grasp on how to extend your deployment to support celery from this article on dev.to, which shows a full deployment of celery:

    apiVersion: apps/v1beta2
    kind: Deployment
    metadata:
      name: celery-worker
      labels:
        deployment: celery-worker
    spec:
      replicas: 1
      selector:
        matchLabels:
          pod: celery-worker
      template:
        metadata:
          labels:
            pod: celery-worker
        spec:
          containers:
            - name: celery-worker
              image: backend:11
              command: ["celery", "worker", "--app=backend.celery_app:app", "--loglevel=info"]
              env:
                - name: DJANGO_SETTINGS_MODULE
                  value: 'backend.settings.minikube'
    
                - name: SECRET_KEY
                  value: "my-secret-key"
    
                - name: POSTGRES_NAME
                  value: postgres
    
                - name: POSTGRES_USER
                  valueFrom:
                    secretKeyRef:
                      name: postgres-credentials
                      key: user
    
                - name: POSTGRES_PASSWORD
                  valueFrom:
                    secretKeyRef:
                      name: postgres-credentials
                      key: password
    

    Kubernetes-way

    In Kubernetes (Openshift is a distribution of Kubernetes) - you can create a cronjob, which will execute a specific task on a schedule, similar to this:

    kubectl run --generator=run-pod/v1 hello --schedule="*/1 * * * *" --restart=OnFailure --image=busybox -- /bin/sh -c "date; echo Hello from the Kubernetes cluster"
    

    which I pulled from Kubernetes docs.

    Cloud way

    You can also use a serverless platform, e.g. AWS Lambda to execute a scheduled job. The cool thing about AWS Lambda is that their free tier will be more than enough for your use case.

    See AWS example code here