Does NiFi have a synchronization mechanism in order to know when something has been finished processing ?
I ingest some data, do some processing and at step N-1 I want to know that all the data has been processed in order to proceed to (final) step N.
[GetFile / 1000 000 lines] ----> [ Proc1 / process step 0 ] -----> [ Proc2 / process step 1 ] .... [ PutSQL / insert into db ] ---> [ Proc to let me know that I've inserted all the data in the table ] ----> [ ProcN / Run aggregates on data for example ]
NiFi does not really have an explicit synchronization feature built in to the framework, but some processors have features that help synchronize activity. I can think of a few possible ways to make your flow work:
Scheduling - you could schedule the GetFile and later aggregate operation using CRON scheduling on processors, assuming that the operations are relatively predictable in duration.
MonitorActivity - the MonitorActivity processor can trigger a flowfile based on inactivity in a queue. You could use this downstream of PutSQL and trigger when inserts have stopped and the aggregates should begin.
MergeContent (Simple) - a MergeContent processor might aggregate the results of PutSQL into a single message that triggers the aggregate operation. You would have to experiment with the properties for bin size and age to get this to work right.
MergeContent (Defragment) - MergeContent has a Defragment strategy designed to correlate fragments of a larger file together. It requires specific attributes to be set on the flowfiles, see the "Reads Attributes" section at the bottom of the docs. The behavior seems close to what you want, but setting those fragment attributes may be difficult.