javascript node.js pdf architecture software-design

Should I use queue system to handle PDF text recognition in multitenant system?

I am building a system to allow our clients to transform PDF bank statements (from many different banks) to its better CSV form (better because it can be imported into accounting application). It will find tables on PDFs pages and convert them into CSV files.

I am going to use:

Simple static webpage with HTML form to upload PDFs and choose which bank to process. It will also display job status and allow to download result of the transformation (CSV files). It should operate without user authentication.
Backend running on NodeJS (more on that later)
Excalibur
Puppeteer (to operate Excalibur)

The Backend has to take responsibility for:

Receiving request from the UI (PDF payload)
Generate new job id
1. sending it back to UI
2. provide HTTP resource for UI to ask for job status
Make new instance of Puppeteer, pass to it received PDF and job id
Wait for Puppeteer to finish, receive archive file (Excalibur puts every page of the table in a separate CSV file)
Unpack archived CSV files
Normalize it with transformers (written with https://www.npmjs.com/package/mississippi)
Send response to UI (client)

Problems that will occur:

Multi-tenancy - multiple users at once will access the system (I am used to PHP which runs in context of a one user session, and I know that NodeJS resides in memory, going to resolve it with 'continuation-local-storage' package)
Communication FE<->BE, there is a challenge with processing of big PDF files (it will take a lot of time) and giving feedback to user. That's why I need some sort of job id to recognize clients.
Disabling Excalibur database - my solution does not need to save any state.

As You can see there is quite a lot of things to do. I do not want to discuss decisions (eg why Puppeteer and not direct access to Excalibur API). This is rather the first, crude version. I have plenty of ideas to improve this system later.

My question is: Should I use message queue system or not to simplify (make it more readable) this system? How could this system benefit from using such queue like AMQP or Azure Queues or simply MongoDB as a queue? How a simple design (block diagram) of such system could look like when using message queue? I have no previous experience with message queues, I never used them, but I feel message queue could help me design better structure of this system.

Solution

In general, queuing is not used to simplify a system. The simplest approach is to do the translation when the message is received and immediately respond with the result. The primary function of a queue is to add a layer of isolation between the data consumer and the data producer which supports a dynamic ordered backlog of messages to work on. Using a queue can be useful in situations where:

Incoming messages do not need to be processed real-time.
Message production rates may temporarily exceed consumption rates.
Message consumers do not depend on message producers.
Processing order of messages is important.

Given translating PDF files to csv is a relatively expensive operation and it doesn't need to complete immediately, writing incoming requests to a queue and responding with a job ID is a reasonable approach.