Search code examples
apache-nifikylo

running record count from SplitRecord processor Nifi


Is there a way to get fragment index from SplitRecord processor Nifi? I am splitting a very big xls (4 mill records) into "Records Per Split" = 100000.

Now I want to just process first 2 splits, to see quality of the file and reject rest of the file.

I can see fragment index is in other split function (e.g. JsonSplit), but not in record split. Any other hack?


Solution

  • Method1:

    By using Control Rate processor we can achieve this case

    Control Rate Processor: enter image description here

    By this configs we are releasing 2 flowfiles for every minute and

    Flow: enter image description here

    Configure the queue expiration to like 10 sec(or lower number if you need), then the flowfiles are going to expired in the queue but first 2 flowfiles are going to be released.

    Method2:

    By using SplitText processor then use RouteOnAttribute Processor and add new property as

    ${fragment.index:le(2)}
    

    By using above expression language we are only allowing only the first 2 fragment indexes.

    Refer to this link for splitting Big File in NiFi.