Search code examples
google-cloud-platformgoogle-cloud-storageapache-nificloud-storage

Problem adding duplicate object in Google Storage using PutGCSObject processor in Nifi


I am using Nifi to send data from Pub/Sub queue to Cloud storage. I'm using the ConsumeGCPubSub processor to fetch data from the queue and the PutGCSObject processor to add Cloud Storage in Nifi. But the PutGCSObject processor is sending duplicate data in Cloud Storage.

I also see that this data has the same MD5 Hash code in its Cloud Storage records. What could be causing this and how can I fix it?

Scheduling Properties

I double checked:

  • pub/sub messages is not duplicated.
  • When I send 30 piece of data, there are come exactly 30 pieces in Nifi
  • I checked my google storage have different data. But there was not..
  • When I examine it, the number of data coming from the queue and exiting the PutGCSObject processor as success is the same, but I see that the data is written over and over again. When I looked into NiFi Data Provenance, I found that there are multiple data with the same FlowFile UUID.

Solution

  • You should have connected the success criterion on the terminate side to the processor.