Search code examples
jsonapache-nifijolt

Transforming the JSON using JOLT


I am trying to process a nested JSON and flatten it in Apache NiFi, with the help of the JoltTransformation processor by supplying a spec.

Sample JSON:

Input

{
  "product": "astro",
  "init": "2022091400",
  "dataseries": [
    {
      "timepoint": 3,
      "cloudcover": 2,
      "seeing": 6,
      "transparency": 2,
      "lifted_index": 2,
      "rh2m": 3,
      "wind10m": {
        "direction": "N",
        "speed": 3
      },
      "temp2m": 33,
      "prec_type": "none"
    },
    {
      "timepoint": 6,
      "cloudcover": 2,
      "seeing": 6,
      "transparency": 2,
      "lifted_index": 2,
      "rh2m": 1,
      "wind10m": {
        "direction": "NW",
        "speed": 3
      },
      "temp2m": 35,
      "prec_type": "none"
    },
    {
      "timepoint": 9,
      "cloudcover": 1,
      "seeing": 6,
      "transparency": 2,
      "lifted_index": 2,
      "rh2m": 2,
      "wind10m": {
        "direction": "N",
        "speed": 3
      },
      "temp2m": 35,
      "prec_type": "none"
    }
  ]
}

Jolt Spec

[
  {
    "operation": "shift",
    "spec": {
      "product": "product",
      "init": "init",
      "dataseries": {
        "*": {
          "timepoint": "timepoint",
          "cloudcover": "cloudcover",
          "seeing": "seeing",
          "transparency": "transparency",
          "lifted_index": "lifted_index",
          "rh2m": "rh2m",
          "wind10m": {
            "direction": "direction",
            "speed": "speed"
          },
          "temp2m": "temp2m",
          "prec_type": "prec_type"
        }
      }
    }
  }
]

Output

{
  "product" : "astro",
  "init" : "2022091400",
  "timepoint" : [ 3, 6, 9 ],
  "cloudcover" : [ 2, 2, 1 ],
  "seeing" : [ 6, 6, 6 ],
  "transparency" : [ 2, 2, 2 ],
  "lifted_index" : [ 2, 2, 2 ],
  "rh2m" : [ 3, 1, 2 ],
  "direction" : [ "N", "NW", "N" ],
  "speed" : [ 3, 3, 3 ],
  "temp2m" : [ 33, 35, 35 ],
  "prec_type" : [ "none", "none", "none" ]
}

Expected Output

{
  "product" : "astro",
  "init" : "2022091400",
  "timepoint" : 3,
  "cloudcover" : 2,
  "seeing" : 6,
  "transparency" : 2,
  "lifted_index" : 2,
  "rh2m" : 3,
  "direction" : "N",
  "speed" : 3,
  "temp2m" : 33,
  "prec_type" : "none"
},
{
  "product" : "astro",
  "init" : "2022091400",
  "timepoint" : 6,
  "cloudcover" : 2,
  "seeing" : 6,
  "transparency" : 2,
  "lifted_index" : 2,
  "rh2m" : 1,
  "direction" : "NW",
  "speed" : 3,
  "temp2m" : 35,
  "prec_type" : "none"
},
{
  "product" : "astro",
  "init" : "2022091400",
  "timepoint" : 9,
  "cloudcover" : 1,
  "seeing" : 6,
  "transparency" : 2,
  "lifted_index" : 2,
  "rh2m" : 2,
  "direction" : "N",
  "speed" : 3,
  "temp2m" : 35,
  "prec_type" : "none"
}

So my expectation is to have flatten the JSON and have single values for each main object in this case product and init, after which I plan to send this over to the ConvertJsontoSql processor within the NiFi to have the records inserted into PostgresDB.

https://jolt-demo.appspot.com/


Solution

  • No need to write each attribute individually, but just use @ and & wildcards, except for product and init those should be taken after going the tree two levels up such as

    [
      {
        "operation": "shift",
        "spec": {
          "dataseries": {
            "*": {
              "@(2,product)": "[&1].product", 
              "@(2,init)": "[&1].init", // 2 stands for reaching the level of the "init" aatribute, [&1] is for reaching the level of indexes of "dataseries" array and shaping the result as array(nested within square brackets)
              "*": "[&1].&",
              "w*": {
                "*": "[&2].&"
              }
            }
          }
        }
      }
    ]
    

    the demo on the site http://jolt-demo.appspot.com/ is :

    enter image description here