I have a flow that fetches all files from a given directory, and they could be gzip, zip or csv, with the gzip and zip holding a single csv file. I then route on MIME type, decompress the gzip files, unpack the zip files, and then bring what are now ALL csv files back together. This is working. I then want to create a zip archive of all of these csv files, and MergeContent seemed like a good candidate (outside of using ExecuteStreamCommand to run zip from the OS).
But no matter what I do, the results are inconsistent:
Clearly I'm throwing darts hoping something will stick. Documentation (and the general wisdom of the web) hasn't been particularly helpful in understanding how this works, what a "bin" is, what a "bundle" is, and most of it seems geared towards breaking apart a single flow file, doing some processing, then bringing it back together. That's not what I'm doing. I'm starting with multiple flow files and want to bring all of them always, every time, to a single flow file.
If the answer is that this can't be done with MergeContent, then I'll just run zip through the OS — but that would mean writing the files to disk, then zipping, and I wanted to try to keep this native Nifi.
Again, I started with default properties, except changing Merge Format to ZIP, and then made my modifications from there. And, yes, I am using the "merged" relationship.
As it turns out, the one property I did not play with, Max Bin Age, was the key to open the door. Matt's excellent explanation here gave tremendous insight, and provided the ultimate solution.
Current config:
Merge Strategy: Bin-Packing Algorithm
Merge Format: ZIP
Attribute Strategy: Keep Only Common Attributes
Correlation Attribute Name: No value set
Minimum Number of Entries: 500
Maximum Number of Entries: 1000
Minimum Group Size: 0 B
Maximum Group Size: No value set
Max Bin Age: 15 secs
Maximum Number of Bins: 5
Compression Level: 1
Keep Path: false