Search code examples
javaspringmultithreadingscheduler

How to maintain consistency while processing multiple files using spring task scheduling?


I have a requirement where scheduler will trigger the task in the fixed delay of 2 minutes. Task picks all the files from the directory(e.g abc) and distributes them to multiple threads for processing. Where each thread does the following things, 1. Reads the data from the particular file(e.g file1.csv). 2. Appends some more data after validation and writes the resultant data to one more file(e.g file1-updated.csv) in the updated(e.g xyz) directory. 3. Deletes the input file file1.csv from directory abc.

Files would be pushed to abc directory dynamically from other server when the end user does some action. When scheduler triggers every 2 minutes it picks up all the files and distributes them to threads as i have explained above. Now the question is - Lets say there were 2 files file1.csv and file2.csv and scheduler picked them and distributed to threads in first trigger. Now file3.csv has been pushed to abc directory and scheduler triggered after 2 minutes again. Now, only file3.csv should be distributed by scheduler to threads not file1.csv and file2.csv since they have been already picked up in previous trigger and they are under processing. I have to ensure that, only new files are distributed to threads for processing.

Can i use file locking mechanism - 1. Lock the file(Using java file locking mechanism) once it's been given to thread. 2. When scheduler triggers second time and distributes the file to thread, check if the file is in locked status if not then only process further else just come out of the thread. 3. Release the lock and delete the file from abc folder once the file process is completed. is there any better way than the file locking mechanism to achieve this? Any help appreciated.


Solution

  • One simple solution would be to have the task (that task which picks up the files and distributes them to multiple threads) maintain a Set of all the files it has picked up and currently in progress. The next time it picks up files, it can check in this Set and process only the new ones after adding the new ones in the Set. The catch is, the threads which process the files will have to remove from this Set once they are done with the file. You will have to use synchronized blocks whenever you manipulate this Set.