Search code examples
jsonapache-nifijolt

How to consider null values in jolt spec in NIFI?


For the given CSV data, the jolt spec is not considering the null values.

I have 2 points to consider.

  1. if submodel1 is empty then it should atleast print till model1.
  2. if model1 is empty then it should print till company.(Agent A3 case)

Here null will be empty value in CSV

CSV DATA

[
  {
    "Agent": "A1",
    "Location": "L1",
    "Company": "Hyundai",
    "Model1": "Verna",
    "Sub-Model1": "2018"
  },
  {
    "Agent": "A1",
    "Location": "L1",
    "Company": "Hyundai",
    "Model1": "Creta",
    "Sub-Model1": null
  },
  {
    "Agent": "A1",
    "Location": "L1",
    "Company": "Hyundai",
    "Model1": "Aura",
    "Sub-Model1":null 
  },
  {
    "Agent": "A2",
    "Location": "L1",
    "Company": "Toyota",
    "Model1": "Fortuner",
    "Sub-Model1": "2020"
  },
{
    "Agent": "A3",
    "Location": "L1",
    "Company": "BMW",
    "Model1": null,
    "Sub-Model1": null
  }
]


Jolt SPec:

[
  {
    "operation": "shift",
    "spec": {
      "*": {
        "@(0,Agent)": "@(1,Agent).Agent",
        "@(0,Location)": "@(1,Agent).loc_id",
        "Sub-Model1": "@(1,Agent).Company.@(1,Company).@(1,Model1).@(1,Sub-Model1)"
      }
    }
  },
  {
    "operation": "cardinality",
    "spec": {
      "*": { // A1, A2
        "loc_id": "ONE",
        "Agent": "ONE"
      }
    }
  },
  {
    "operation": "shift",
    "spec": {
      "*": { // A1, A2
        "*": "[#2].&",
        "Company": {
          "*": { // Hyundai
            "*": { // Verna
              "*": { // 2018
                "@": "[#6].&4.&3[#3].&2.[#1].&"
              }
            }
          }
        }
      }
    }
  },
  {
    "operation": "modify-overwrite-beta",
    "spec": {
      "*": {
        "*": {
          "*": {
            "*": {
              "*": {
                "*": {
                  "*": []
                }
              }
            }
          }
        }
      }
    }
  }
]

Output: This is json I'm getting but I want to print the available data. For creta, AURA nothing has printed and BMW is not printed also

[
  {
    "Agent": "A1",
    "loc_id": "L1",
    "Company": {
      "Hyundai": [
        {
          "Verna": [
            {
              "2018": []
            }
          ]
        }
      ]
    }
  },
  {
    "Agent": "A2",
    "loc_id": "L1",
    "Company": {
      "Toyota": [
        {
          "Fortuner": [
            {
              "2020": []
            }
          ]
        }
      ]
    }
  },
  {
    "Agent": "A3",
    "loc_id": "L1"
  }
]

Expected output

[
  {
    "Agent": "A1",
    "loc_id": "L1",
    "Company": {
      "Hyundai": [
        {
          "Verna": [
            {
              "2018": []
            }
          ]
        },
        {
          "Creta": [
            {}
          ]
        },
        {
          "Aura": [
            {}
          ]
        }
      ]
    }
  },
  {
    "Agent": "A2",
    "loc_id": "L1",
    "Company": {
      "Toyota": [
        {
          "Fortuner": [
            {
              "2020": []
            }
          ]
        }
      ]
    }
  },
  {
    "Agent": "A3",
    "loc_id": "L1",
    "Company": {
      "BMW": [
        {}
      ]
    }
  }
]

Solution

  • You can use else case of the notNull function within a modify transformation, and partition the attributes into sub-objects by their respective Agent values such as

    [ 
      { // apply conversion for the null elements of "Model1" and "Sub-Model1" attributes
        "operation": "modify-overwrite-beta",
        "spec": {
          "*": {
            "*odel1": ["=notNull", " "] // the values are overwritten provided the attributes have null values
          }
        }
      },
      { // group the objects by their "Agent" values
        "operation": "shift",
        "spec": {
          "*": {
            "Agent": "@1,Agent.&",
            "Location": "@1,Agent.loc_id",
            "# ": "@1,Agent.@1,Company[0].@1,Model1[].@1,Sub-Model1"
          }
        }
      },
      { // get rid of the keys of the objects 
        "operation": "shift",
        "spec": {
          "*": {
            "@": ""
          }
        }
      },
      { // get rid of the attributes with " " keys which had been used above for transformation as the else case
        "operation": "remove",
        "spec": {
          "*": {
            "*": {
              "*": {
                " ": "",
                "*": {
                  "*": {
                    " ": ""
                  }
                }
              }
            }
          }
        }
      },
      {
        "operation": "modify-overwrite-beta",
        "spec": {
          "*": {
            "*": {
              "*": {
                "*": {
                  "*": {
                    "*": ["= ", []]
                  }
                }
              }
            }
          }
        }
      },
      { // reduce the number of the repeating components to one
        "operation": "cardinality",
        "spec": {
          "*": {
            "Agent": "ONE",
            "loc_id": "ONE"
          }
        }
      }
    ]