Search code examples
arraysjsonavroapache-nifihortonworks-data-platform

Apache Nifi: Parse data with UpdateRecord Processor


I'm trying to parse some data in Nifi (1.7.1) using UpdateRecord Processor. Original data are json files, that I would like to convert to Avro, based on a schema. The Avro conversion is ok, but in that convertion I also need to parse one array element from the json data to a different structure in Avro. This is a sample data of the input json:

{  "geometry" : {
"coordinates" : [ [ 4.963087975800593, 45.76365595859971 ], [ 4.962874487781098, 45.76320922779652 ], [ 4.962815443439148, 45.763116079159374 ], [ 4.962744732112515, 45.763010484202866 ], [ 4.962096825239138, 45.762112721939246 ] ]}  ...}

Being its schema (specified in RecordReader):

{  "type": "record",
  "name": "features",
  "fields": [
    {
      "name": "geometry",
      "type": {
        "type": "record",
        "name": "geometry",
        "fields": [
          {
            "name": "coordinatesJson",
            "type": {
              "type": "array",
              "items": {
                "type": "array",
                "items": "double"
              }
            }
          },
        ]
      }
    },
    ....
  ]
} 

As you can see, coordinates is an array of arrays.

And I need to parse those data to Avro, based on this schema (specified in RecordWriter):

{
  "name": "outputdata",
  "type": "record",
  "fields": [
    {"name": "coordinatesAvro",
      "type": {
        "type": "array",
        "items" : {
        "type" : "record",
        "name" : "coordinatesAvro",
        "fields" : [ {
          "name" : "X",
          "type" : "double"
        }, {
          "name" : "Y",
          "type" : "double"
        } ]
      }
      }
    },
    .....

  ]
}   

The problem here is that I'm not being able to parse from coordinatesJson to coordinatesAvro, using RecordPath functions I tried several mappings, like:

Property:                            Value:
/coordinatesJson[0..-1]/X            /geometry/coordinatesAvro[*][0]
/coordinatesJson[0..-1]/Y            /geometry/coordinatesAvro[*][1]

It should be a pretty straighforward parsing step, but as I said, I've been going in circles to achive this for a while.

Any help would be really appreciated.


Solution

  • When I collide with something like that I do next: 1) Transofrm Json into Json with strcuture that I need (for example in your case: coordinatesAvro) by ExecuteScript Processor. I have used ECMAScript cause you can simple parse JSON and work with objects (transform them). 2) ConvertJsonToAvro with one common schema (coordinatesAvro in your case) for Reader and Writer. It works very good and I have used it on BigData cases. This is one of possible resolutions for your problem.