Search code examples
kylo

Kylo | High Water Mark functionality


I have feed running in every five minute and using load/release hive water mark feature. Consider a scenario where job execution took more than 5 minutes and water mark commit did not happen.

In this scenario will Kylo launch another feed instance with old water mark or will it wait for commit to happen?


Solution

  • If a watermark is active (i.e. a flowfile loaded the watermark and is processing it, but has not yet released it), a new flowfile attempting to again load the same watermark will be blocked. It will wait for the active watermark to be released (via a commit or reject).

    You can exercise control over this behavior via the 'Active Water Mark Strategy' property on the 'LoadHighWaterMark' processor. These can help in cases where processing is stuck or is taking longer than expected. If the strategy is set to 'Yield', the processor will yield if watermark is active. The number of times the yield happens is configured via the processor property 'Max Yield Count'. Once this yield count is reached, processor will route flowfile to 'ActiveFailure' relationship. Duration of each yield can be set via Settings -> Yield Duration on the processor. If the strategy is set to 'Route', the processor will immediately route flowfile to 'ActiveFailure' relationship.

    Take care to use the 'ReleaseHighWaterMark' processor at leaf Success, Failure and ActiveFailure relationships. It supports two modes - commit and reject.