Search code examples
apache-nifi

nifi splitRecord hung


I'm testing nifi SplitRecord with a small file of only 11 records However, SplitRecord hangs for a long time. I don't get a clue what it is doing. Processor Hung

SPlitRecord Properties: more properties

Is Records Per Split controlling the maximum, or the minimum, or exact number of records per split? if the total number of records is less than records per split, what's the behavior of SplitRecords? does it wait until a time-out and then put all on-hold records in to a single split?

After about 10 minutes or random number of start/stop/terminate/restart it may trigger the processor to split the data sooner.


Solution

  • Records Per Split controls the maximum, see "SplitRecord.java" for the code. If there are fewer records than the RECORDS_PER_SPLIT value, it will immediately push them all out.

    However, it does look like it is creating a new FlowFile, even if the total record count is less than the RECORDS_PER_SPLIT value, meaning it's doing disk writing regardless of whether a split really occured.

    I would probably investigate two things:

    1. Host memory - how much memory does the host have? How much is configured as NiFi max heap? How much total system memory is in use/free? NiFi performs best when plenty of system memory is left for file cache.
    2. Host's disks, specifically the disk that has the Content Repository on it. Capacity? IO? Is it shared with other services? FlowFile content is written to the Content Repository, if the disk is shared with the OS, or other busy services (or other NiFi repos) it can really slow content modification down.

    Note: your NiFi version over 3 years old, please consider upgrading.