Search code examples
jsonjolt

Extract data from nested JSON arrays into simpler array


I have a JSON formatted payload with nested arrays that I want to modify into a simpler array. Input looks like this:

[
  {
    "pim_network": [
      {
        "addresses": [
          {
            "ifindex": 1,
            "ifname": "lo",
            "address": "00:00:00:00:00:00",
            "broadcast": "00:00:00:00:00:00",
            "addr_info": [
              {
                "family": "inet",
                "local": "127.0.0.1",
                "prefixlen": 8,
                "label": "lo"
              },
              {
                "family": "inet6",
                "local": "::1",
                "prefixlen": 128
              }
            ]
          },
          {
            "ifindex": 2,
            "ifname": "ens33",
            "address": "00:0c:29:02:b8:cd",
            "broadcast": "ff:ff:ff:ff:ff:ff",
            "addr_info": [
              {
                "family": "inet",
                "local": "192.168.119.129",
                "prefixlen": 24,
                "broadcast": "192.168.119.255",
                "label": "ens33"
              },
              {
                "family": "inet6",
                "local": "fe80::20c:29ff:fe02:b8cd",
                "prefixlen": 64
              }
            ]
          },
          {
            "ifindex": 3,
            "ifname": "docker0",
            "address": "02:42:36:ae:50:55",
            "broadcast": "ff:ff:ff:ff:ff:ff",
            "addr_info": [
              {
                "family": "inet",
                "local": "172.17.0.1",
                "prefixlen": 16,
                "broadcast": "172.17.255.255",
                "label": "docker0"
              }
            ]
          }
        ]
      }
    ]
  }
]

Desired output is (some key:value pairs are used twice for the same MAC-adress for ipv4 and ipv6 data):

[
  {
    "pim_network": [
      {
        "ifindex": 1,
        "ifname": "lo",
        "address": "00:00:00:00:00:00",
        "broadcast": "00:00:00:00:00:00",
        "family": "inet",
        "local": "127.0.0.1",
        "prefixlen": 8,
        "label": "lo"
      },
      {
        "ifindex": 1,
        "ifname": "lo",
        "address": "00:00:00:00:00:00",
        "broadcast": "00:00:00:00:00:00",
        "family": "inet6",
        "local": "::1",
        "prefixlen": 128
      },
      {
        "ifindex": 2,
        "ifname": "ens33",
        "address": "00:0c:29:02:b8:cd",
        "broadcast": "192.168.119.255",
        "family": "inet",
        "local": "192.168.119.129",
        "prefixlen": 24,
        "label": "ens33"
      },
      {
        "ifindex": 2,
        "ifname": "ens33",
        "address": "00:0c:29:02:b8:cd",
        "broadcast": "ff:ff:ff:ff:ff:ff",
        "family": "inet6",
        "local": "fe80::20c:29ff:fe02:b8cd",
        "prefixlen": 64
      },
      {
        "ifindex": 3,
        "ifname": "docker0",
        "address": "02:42:36:ae:50:55",
        "broadcast": "172.17.255.255",
        "family": "inet",
        "local": "172.17.0.1",
        "prefixlen": 16,
        "label": "docker0"
      }
    ]
  }
]

(Non working) spec so far:

[
  {
    "operation": "shift",
    "spec": {
      "*": {
        "*": "&",
        "pim_network": {
          "*": {
            "addresses": {
              "*": {
                "address": "pim_network.[&1].address",
                "addr_info": {
                  "*": {
                    "local": "pim_network.[&1].address.local"
                  }
                }
              }
            }
          }
        }
      }
    }
  }
]

But then we get output like this:

{
  "pim_network": [
    {
      "address": "00:00:00:00:00:00"
    },
    {
      "address": [
        {
          "local": "::1"
        },
        "00:0c:29:02:b8:cd"
      ]
    },
    {
      "address": "02:42:36:ae:50:55"
    }
  ]
}

We have been struggling to find easy to understand documentation regarding using & (and other lookup functions) in JOLT transformations, but most sites we found are too abstract or assume a deeper knowledge of JOLT. If anybody knows a good site with good tutorials or an online course, please let me know. Me and my colleagues are advanced scripters, but that appears not to be enough to grasp the intricacies of JOLT easily.


Solution

  • Seems that you need to loop within addr_info arrays, then start with deep diving into their leaf nodes to tile in the first spec, and dissipate the outer elements dynamically in the second spec such as

    [
      {
        "operation": "shift",
        "spec": {
          "*": {
            "pim_network": {
              "*": {
                "addresses": {
                  "*": {
                    "*": "&4.&1.others.&", //the elements other than "addr_info"
                    "addr_info": {
                      "*": {
                        "*": "&6.&3.&2[&1].&"
                      }
                    }
                  }
                }
              }
            }
          }
        }
      },
      {
        "operation": "shift",
        "spec": {
          "*": { //outermost level, eg. the level of "pim_network" 
            "*": { // the indexes(0,1,2) of "pim_network" derived within the previous spec
              "addr_info": {
                "*": {
                  "@2,others": { "*": "&5.&4_&2.&" }, // go two levels up the tree to grab the values of the "others" object
                  "*": "&4.&3_&1.&" // &4 stands for the key literal "pim_network"
                }
              }
            }
          }
        }
      },
      { //get rid of the object keys
        "operation": "shift",
        "spec": {
          "*": {
            "*": "[0].&1[]"//add a wrapper for "pim_network" by [0]
                           //add the literal "pim_network" by &1
          }
        }
      }
    ]
    

    the demo on the site https://jolt-demo.appspot.com/ is :

    enter image description here