Search code examples
jsonapache-kafkaapache-nifiavro

Nifi, how to produce via Kafka avro files with multiple records each file


I created a pipeline that handles a single json file (a vector of 5890 elements, each a record) and send it via Kafka in avro format. The producer works fine, then when I read it with a consumer I get a flowfile (a avro file) each record. 5890 avro files. How can I set or merge more records in a single avro file?

I simply use a PublishKafkaRecord_0_10 1.5.0 (jsonTreeReader 1.5.0 and AvroRecordSetWriter 1.5.0) and ConsumeKafka_0_10 1.5.0 .


Solution

  • Firstly, NiFi 1.5.0 is from January 2018. Please consider upgrading as this is terribly out of date. NiFi 1.15.3 is the latest as of today.

    Secondly, the *Kafka_0_10 processors are geared at very old versions of Kafka - are you really using v0.10 of Kafka? You have the following processors for later Kafka versions:

    It would be useful if you provide examples of your input and desired output and what you are actually trying to achieve.

    If you are looking to consume those message in NiFi and you want a single FlowFile with many messages, you should use ConsumeKafkaRecord rather than ConsumeKafka. This will let you control how many records you'd like to see per 'file'.

    If your consumer is not NiFi, then either they need to merge on their end, or you need to bundle all your records into one larger message when producing. However, this is not really the point of Kafka as it's not geared towards large messages/files.