The end goal is to create a flowFile which contains JSON that contains the information for all matched flowFiles in the following example format.
{
"matched": [
{
"id":${uuid},
"fileName":${filename}
}
]
}
I have a flowFile which contains IDs. These IDs would possibly be found in a flowFile. What I am needing to do is use the flowFile with all of the IDs as a reference to find a matching ID in each of the flowFiles that comes in.
Or, as flowFiles come in look into the main flowFile with IDs to see if its ID is found.
As the comparisons are done and a match is found construct the JSON format for that flowFile that matches and add the created JSON to a single report flowFile that contains the above JSON format.
Can someone kindly explain how I can with which processors achieve this goal?
Made use of the PutDistributedCache and FetchDistributedCache along with the RetryFlowFile.
After this I am able to determine with routeonattribute if I need to modify the flowFile content.
Note: The default size is I think 256 characters for the value that is allowed for the attribute that you set from reading in the cache on the right side of the diagram. I have a need for 10k+ characters. So MAKE SURE that you utilize the updateAttribute to remove the attribute as soon as you are done with it to prevent memory issues.
I am running about 10k files and don't have any issues with this approach.