Search code examples
phplaravelparallel-processingqueue

Batch processing of large number of data set Laravel


I currently have a Cron in laravel which takes data in chunk and call an API.

The flow is below

1. Cron runs every 5 minutes
2. Takes 80 data in chunks
3. Loop it through and call an API for 80 times and takes the next 80 in the next cron cycle

This method is so slow that if there are 10000000 records in the table it would take that many times to processes it. I have earlier used a chunk of 1000 which was breaking my system saying "Too many open files" exception for which I have reduced the chunk from 1000 to 80.

I know this a very bad design of what I am doing.

I need to re architect the current model and build something which can process in parallel. Atleast 500-1000 parallel processing of data.

How do I do that in Laravel. Is it even possible through PHP or do I have to look at an option like nodejs ? Please help me if its possible to use a queue. Even if I use queue will I be able to do parallel processing through it ?

Update

Now I have tried using Laravel queues

The command running in the background

php /Users/ajeesh/PhpstormProjects/untitled3/open-backend-v2/artisan queue:work database --tries=1 --timeout=56

My jobs are getting processed 3 times. I can't figure out why.

can someone suggest a solution here for this ?


Solution

  • To run parallel jobs you will need to install a manager, like "Supervisor", that will provide various workers (instances) for you. You can set as many workers as your server resources can handle.

    Keep in mind that each worker is a different instance of your laravel application, reflecting it's state at the time it was created. If you made changes to relevant code, like code for the job, you'll need to restart the supervisor, so it can get the newer version.

    Supervisor

    Next, you'll have to set a way for each job dispatched to require the correct available chunk.

    Job 1 will get chunk 1 to 80. Job 2 will get chunk 81 to 160. ...

    You haven't detailed your code, maybe this will not be a problem, but if it is you can create a database table to track the chunks available and chunks not yet processed.

    Regarding you job being fired 3 times, the code below:

    php /Users/ajeesh/PhpstormProjects/untitled3/open-backend-v2/artisan queue:work database --tries=1 --timeout=56
    

    It's function is to dispatch jobs already in the queue. Maybe another piece of code is queuing the job 3 times?

    You won't need to manually dispatch jobs once you install "Supervisor". It will keep track of your jobs and dispatch'em as soon as they arrive (if you configured them that way).