firebase firebase-realtime-database distributed-computing

Firebase: Making sure an action is performed only once using multiple workers

I am matching the items from two lists, eg here the element c of A and c of B would match. I then do some processing and add the matched pair in another list.

 - List A
  - a
  - b
  - c

 - List B
  - c
  - d

To do this I watch for addition on both list A and B, and check if a match exist when something is added.

This works well, but I have too many inserts for a single client.

So I need to run my matcher on multiple machines to speed things up.

But I want each match to happen only on one machine, ie if machine 1 finds a match there's no point for machine 2 to process it as well.

I tried using atomic commits but while this prevents multiple matches to mess with each other, the matching is still done twice.

How could I "lock" elements to make sure other machines don't consider them once the matching process started?

Solution

Firebase does not provide native support for something like this, and additionally I would be concerned by the lack of idempotency in the streaming protocol itself. If you subscribe to updates on a topic, but the node itself dies, on your next server start you will get a VALUE update, not a collection of the all the INCREMENTAL updates that occurred while your node was down.

With good data structures you could "roll your own" facilities like this. After all, cluster-aware task processors with task idempotency and worker locking like Resque and Celery do exactly that with not much more in the way of base resources (Redis, a DB, etc.) You will need to add data sets to manage worker locking, job ID locking by workers, recovery/error handling facilities, etc. However, if you review the code they use to do this you will quickly see that it takes more work than a simple StackOverflow post will manage to achieve this.

As an alternative, why not consider using a stack such as ActionHeroJS as a cluster-aware API layer? It has Redis-backed cluster mechanics and Resque-based task management with all of your requirements covered and it pairs really well with Firebase...