Search code examples
javaspringfilespring-integrationsplitter

Split File and Process each part while using a shared resource


I am using spring integration for polling a file. This single file contains multiple reports. I want to split the file into report files and save as different files.

<int-file:inbound-channel-adapter id="filesIn"
        directory="file:${fileInDirectory}" 
        filename-pattern="*.txt" 
        prevent-duplicates="true">
    <int:poller id="poller" fixed-delay="5000"/>
    </int-file:inbound-channel-adapter>

<int:service-activator input-channel="filesIn"
                                   output-channel="filesOut"
                                   ref="handler"/>

<int-file:outbound-channel-adapter id="filesOut"
                                   directory="file:${archiveDirectory}"
                                   delete-source-files="true"/>

Inside the handler, handling method inside handler is like following.

public List<ReportContent> splitTextToReports(File file){ 
     // split the file
     // store the file content text to ReportContent object
     // add to a List of ReportContent
}

ReportContent has following fields

  • reportData (the text that will be saved in new file)

  • reportType

  • reportDate

There is another processing that is required for each ReportContent.

  • Look up report path to save the report, based on the report type. This is done through a service call.
  • Save report data in a table

Following is the method that will process each element of the list returned in the above method.

public void processReportContent (ReportContent reportContent){
   // process report content and save the file in the relevant place
}

Two parts to the question.

  1. How to use a splitter to take over just after first master File is read. So that processing of each report can be done part of splitted objects.
  2. The Service that look up report path should use a common HashMap between all splitted objects. If a value based on the report type exists in this hash map, it will retrieve from this map. Otherwise a separate API call should be executed to retrieve the report path using the report type. Report type and the value (report) received from this API call will be stored in the map. The importance of Map is to avoid making unnecessary API calls.

Solution

  • To process items in parallel there always was a trick for <splitter> like the downstream ExecutorChannel, so during the iteration of the splitted items we move to the next one immediately after sending the previous.

    In addition for better throughput the splitter support Iterator for streaming.

    I was going to suggest the FileSplitter for your task, but I guess that you don't split by lines, but by some other identificator. Maybe your content is just XML or JSON, which allows to determine part of the content enough easy.

    From here that might not be so easy to provide some Iterator implementation for your case.

    However I guess it doesn't matter. You have already the split logic and builds your List<ReportContent>.

    Regarding the ConcurrentMap.

    How about to take a look into the @Cacheable Spring support for your "hard" service, when the next call for the same key will just return the value from cache?

    For this purpose you can use the directory-expression on the <int-file:outbound-channel-adapter>:

    <int-file:outbound-channel-adapter directory-expression="@reportPathService.getPath(payload)" /> 
    

    The same technique you can accept for the file name as well.

    Note: pay attention to the default header for the file name: FileHeaders.FILENAME.