Search code examples
node.jsfirebasegoogle-cloud-firestoregoogle-cloud-functionsgoogle-cloud-scheduler

Run a Cron Job every 30mins after onCreate Firestore event


I want to have a cron job/scheduler that will run every 30 minutes after an onCreate event occurs in Firestore. The cron job should trigger a cloud function that picks the documents created in the last 30 minutes-validates them against a json schema-and saves them in another collection.How do I achieve this,programmatically writing such a scheduler? What would also be fail-safe mechanism and some sort of queuing/tracking the documents created before the cron job runs to push them to another collection.


Solution

  • Building a queue with Firestore is simple and fits perfectly for your use-case. The idea is to write tasks to a queue collection with a due date that will then be processed when being due.

    Here's an example.

    1. Whenever your initial onCreate event for your collection occurs, write a document with the following data to a tasks collection:
        duedate: new Date() + 30 minutes
        type: 'yourjob'
        status: 'scheduled'
        data: '...' // <-- put whatever data here you need to know when processing the task
    
    
    1. Have a worker pick up available work regularly - e.g. every minute depending on your needs
    // Define what happens on what task type
    const workers: Workers = {
      yourjob: (data) => db.collection('xyz').add({ foo: data }),
    }
    
    
    // The following needs to be scheduled
    
    export const checkQueue = functions.https.onRequest(async (req, res) => {
      // Consistent timestamp
      const now = admin.firestore.Timestamp.now();
      // Check which tasks are due
      const query = db.collection('tasks').where('duedate', '<=', new Date()).where('status', '==', 'scheduled');
      const tasks = await query.get();
      // Process tasks and mark it in queue as done
      tasks.forEach(snapshot => {
        const { type, data } = snapshot.data();
        console.info('Executing job for task ' + JSON.stringify(type) + ' with data ' + JSON.stringify(data));
        const job = workers[type](data)
          // Update task doc with status or error
          .then(() => snapshot.ref.update({ status: 'complete' }))
          .catch((err) => {
            console.error('Error when executing worker', err);
            return snapshot.ref.update({ status: 'error' });
          });
    
        jobs.push(job);
      });
      return Promise.all(jobs).then(() => {
        res.send('ok');
        return true;
      }).catch((onError) => {
        console.error('Error', onError);
      });
    });
    

    You have different options to trigger the checking of the queue if there is a task that is due:

    • Using a http callable function as in the example above. This requires you to perform a http call to this function regularly so it executes and checks if there is a task to be done. Depending on your needs you could do it from an own server or use a service like cron-job.org to perform the calls. Note that the HTTP callable function will be available publicly and potentially, others could also call it. However, if you make your check code idempotent, it shouldn't be an issue.
    • Use the Firebase "internal" cron option that uses Cloud Scheduler internally. Using that you can directly trigger the queue checking:
        export scheduledFunctionCrontab =
        functions.pubsub.schedule('* * * * *').onRun((context) => {
            console.log('This will be run every minute!');
            // Include code from checkQueue here from above
        });
    

    Using such a queue also makes your system more robust - if something goes wrong in between, you will not loose tasks that would somehow only exist in memory but as long as they are not marked as processed, a fixed worker will pick them up and reprocess them. This of course depends on your implementation.