Search code examples
jsontwitterapache-nifiavro

apache NiFi convert JSON to avro


How can I convert JSON to Avro in apache NiFi?

I.e. JSON is obtained using the getTwitter? enter image description here

Previous versions seem to support a ConvertJSONToAvro. To me, it looks like nowadays the convertRecord processor should be used: enter image description here

I.e. using record-oriented processing to read the JSON using a JSON tree reader and write it to Avro. But where / how do I specify the schema? Especially for such a complex schema as i.e. obtained from Twitter. Is NiFi automatically guessing the right schema somehow?

edit

In fact, something rather obvious happens:

 ConvertRecord Failed to process StandardFlowFileRecord will route to failure: ${schema.name} did not provide appropriate Schema Name

I.e. convert record succeeds in parsing the json, but when trying to apply the avro writer it fails. So how could I get an avro representation from the tweets?


Solution

  • You should be able to infer the schema and have that translated automatically to Avro using the modern record processors. Abdelkrim Hadjidj has a great write up about it, but to summarize:

    Modern method

    Use the Schema Inference capability in the JsonPathReader or JsonTreeReader implementation you're using in ConvertRecord. This will allow it to infer the schema, and then pass that along to the AvroRecordSetWriter via the schema.name and avro.schema attributes. There is also a schema inference cache (use the provided volatile implementation unless you have other requirements) to improve performance.

    Old method

    Use the InferAvroSchema processor to parse the incoming data and generate an Avro schema.

    Note: this processor is no longer included with default builds of NiFi due to space restrictions, but you can manually build the nifi-kite-nar and then deploy it into the $NIFI_HOME/extensions/ directory to load that functionality.