Search code examples
apache-kafkaapache-nifiparquetinfluxdb

How to convert InfluxDB Line Protocol to Parquet in NiFi


I have influxDB Line Protocol records coming in to NiFi via a ConsumeKafka processor, and then merged into flowfiles containing 10,000 records. Now I'd like to get them converted to Parquet and stored in HDFS with an end goal of building Impala tables for the end user. Is there a way to convert Line Protocol to something consumable by the PutParquet processor, or another way to convert to Parquet files?

I did find a custom influxlineprotocolreader processor, however there's very little information and no examples (that I've found) on how to use this processor so I'm not sure if it fits this use case.

Alternatively, I can use Spark to do the conversion and write Parquet files, but I was hoping to do everything in NiFi if at all possible, especially since I haven't found many resources on doing such a conversion in Spark either (I'm new to both Spark and NiFi).


Solution

  • There is nothing out of the box in NiFi that understands InfluxDB line protocol. You would have to implement something that converted that to a known format like JSON, Avro, etc, and then you could go to Parquet, or if you implemented a InfluxDbRecordReader then you could use ConvertRecord with that and a parquet writer to go directly between the two.