How can I convert JSON to Avro in apache NiFi?
I.e. JSON is obtained using the getTwitter
?
Previous versions seem to support a ConvertJSONToAvro
.
To me, it looks like nowadays the convertRecord processor should be used:
I.e. using record-oriented processing to read the JSON using a JSON tree reader and write it to Avro. But where / how do I specify the schema? Especially for such a complex schema as i.e. obtained from Twitter. Is NiFi automatically guessing the right schema somehow?
In fact, something rather obvious happens:
ConvertRecord Failed to process StandardFlowFileRecord will route to failure: ${schema.name} did not provide appropriate Schema Name
I.e. convert record succeeds in parsing the json, but when trying to apply the avro writer it fails. So how could I get an avro representation from the tweets?
You should be able to infer the schema and have that translated automatically to Avro using the modern record processors. Abdelkrim Hadjidj has a great write up about it, but to summarize:
Use the Schema Inference capability in the JsonPathReader
or JsonTreeReader
implementation you're using in ConvertRecord
. This will allow it to infer the schema, and then pass that along to the AvroRecordSetWriter
via the schema.name
and avro.schema
attributes. There is also a schema inference cache (use the provided volatile implementation unless you have other requirements) to improve performance.
Use the InferAvroSchema
processor to parse the incoming data and generate an Avro schema.
Note: this processor is no longer included with default builds of NiFi due to space restrictions, but you can manually build the nifi-kite-nar
and then deploy it into the $NIFI_HOME/extensions/
directory to load that functionality.