Search code examples
apache-nifi

Nifi DetectDuplicate Not Detecting Duplicates


I am using the DetectDuplicate processor within a flow but am seeing some confusing behavior. The processor is configured as follows:

Cache Entry Identifier: ${rk.id}
FlowFile Description: Empty string set
Age Off Duration: 10s
Distributed Cache Service: DistributedMapCacheClientService
Cache The Entry Identifier: true

The "duplicate" relationship is automatically terminated. Concurrency is set to 1.

However, I'm seeing multiple copies of flowfiles on the output queue with the same rk.id that were run through the processor less than 2 seconds apart. How is this possible? I even tried increasing the age off to 5m and it made no difference. I also tried setting the processor to only run every 500ms, thinking there may be some delay in writing to the cache, and 2 flowfiles that were processed 1s apart with the same rk.id showed up in the output queue. What am I missing?


Solution

  • I think I figured this out. It looks like the cache was full and not accepting new values? Because we had a lot less traffic this morning and it seems to have properly run the deduplication.