I am refactoring an Analytic system that will do a lot of calculation, and I need some ideas on possible architectural designs to a data consistency issue I am facing.
Current Architecture
I have a queue based system, in which different requesting applications create messages that are eventually consumed by workers.
Each "Requesting App" breaks down a large calculation into smaller pieces that will be sent to the queue and processed by the workers.
When all the pieces are finished, the originating "Requesting app" will consolidate the results.
Also, the workers consume information from a centralized database (SQL Server) in order to process the requests (Important: the workers do not change any data on the database, only consume it).
Problem
Ok. So far, so good. The problem arises when we include a web service that updates the information on the database. This can happen at any time, but it is critical that each "large calculation" originated from the same "Requesting App" sees the same data on the database.
For Example:
I just can´t have worker W2 using state S1 of the database. for the whole calculation to be consistent it should use the previous S0 state.
Thoughts
A lock pattern to prevent the web server from changing the database while there is a worker consuming information from it.
Create new layer between the database and the workers (a server that controls db caching by req. app)
I am pending to the second solution, but not very confident about it.
Any brilliant ideas ? Am I designing it wrong, or missing something ?
OBS:
Thanks everybody for the help.
Since I believe this is problem might be usual in other scenarios, I would like to share the solution we chose.
Thinking more thoroughly about the problem, I understood it for what it really is.
Now the calculation has evolved to be distributed, I just needed to evolve my cache to be distributed as well.
In order to do that, we chose to use an In-Memory Database (hash-value), deployed as a separate server. (in this case Redis).
Now every time I start a job, I create a ID for the job and pass it to their messages
When each worker wants some information from the database, it would:
At the end of the job, I clear all hashes associated with the job ID.