I am building a system to allow our clients to transform PDF bank statements (from many different banks) to its better CSV form (better because it can be imported into accounting application). It will find tables on PDFs pages and convert them into CSV files.
I am going to use:
The Backend has to take responsibility for:
Problems that will occur:
As You can see there is quite a lot of things to do. I do not want to discuss decisions (eg why Puppeteer and not direct access to Excalibur API). This is rather the first, crude version. I have plenty of ideas to improve this system later.
My question is: Should I use message queue system or not to simplify (make it more readable) this system? How could this system benefit from using such queue like AMQP or Azure Queues or simply MongoDB as a queue? How a simple design (block diagram) of such system could look like when using message queue? I have no previous experience with message queues, I never used them, but I feel message queue could help me design better structure of this system.
In general, queuing is not used to simplify a system. The simplest approach is to do the translation when the message is received and immediately respond with the result. The primary function of a queue is to add a layer of isolation between the data consumer and the data producer which supports a dynamic ordered backlog of messages to work on. Using a queue can be useful in situations where:
Given translating PDF files to csv is a relatively expensive operation and it doesn't need to complete immediately, writing incoming requests to a queue and responding with a job ID is a reasonable approach.