I am matching the items from two lists, eg here the element c of A and c of B would match. I then do some processing and add the matched pair in another list.
- List A
- a
- b
- c
- List B
- c
- d
To do this I watch for addition on both list A and B, and check if a match exist when something is added.
This works well, but I have too many inserts for a single client.
So I need to run my matcher on multiple machines to speed things up.
But I want each match to happen only on one machine, ie if machine 1 finds a match there's no point for machine 2 to process it as well.
I tried using atomic commits but while this prevents multiple matches to mess with each other, the matching is still done twice.
How could I "lock" elements to make sure other machines don't consider them once the matching process started?
Firebase does not provide native support for something like this, and additionally I would be concerned by the lack of idempotency in the streaming protocol itself. If you subscribe to updates on a topic, but the node itself dies, on your next server start you will get a VALUE update, not a collection of the all the INCREMENTAL updates that occurred while your node was down.
With good data structures you could "roll your own" facilities like this. After all, cluster-aware task processors with task idempotency and worker locking like Resque and Celery do exactly that with not much more in the way of base resources (Redis, a DB, etc.) You will need to add data sets to manage worker locking, job ID locking by workers, recovery/error handling facilities, etc. However, if you review the code they use to do this you will quickly see that it takes more work than a simple StackOverflow post will manage to achieve this.
As an alternative, why not consider using a stack such as ActionHeroJS as a cluster-aware API layer? It has Redis-backed cluster mechanics and Resque-based task management with all of your requirements covered and it pairs really well with Firebase...