Search code examples
pythonjmespath

JMESPath filter/MultiSelect on a large JSON file


Datasource is a Juniper Router routing table output in json (> 3 GB json file). What I finally would like to have is being able to loop over a list of prefixes and get the prefix as-path combination.

#!/usr/bin/env python

import json
from jmespath import search

jsonData = """{
  "route-information": [
    {
      "route-table": [
        {
          "rt": [
            {
              "rt-destination": [
                {
                  "data": "2001:db8:1::/48"
                }
              ],
              "rt-entry": [
                {
                  "as-path": [
                    {
                      "data": "64511 65551 I"
                    }
                  ]
                }
              ]
            },
            {
              "rt-destination": [
                {
                  "data": "2001:db8:2::/48"
                }
              ],
              "rt-entry": [
                {
                  "as-path": [
                    {
                      "data": "65536 64496 I"
                    }
                  ]
                }
              ]
            }
          ]
        }
      ]
    }
  ]
}"""

data = json.loads(jsonData)

query='"route-information"[]."route-table"[].rt[].{\"destination\": \"rt-destination\", path: \"rt-entry\"}'

output = search(query, data)
print(output)

the above query results in:

[{'destination': [{'data': '2001:db8:1::/48'}], 'path': [{'as-path': [{'data': '64511 65551 I'}]}]}, {'destination': [{'data': '2001:db8:2::/48'}], 'path': [{'as-path': [{'data': '65536 64496 I'}]}]}]

Looks like on the right way. But I would to have the prefix : as-path combination, so I would like to get rid of the "data:" and "as-path.data" part (in the actual json file there are many more objects on this level and I try to get rid of them here).

query='"route-information"[]."route-table"[].rt[].{\"destination\": \"rt-destination\", path: \"rt-entry\".\"as-path\"}

and/or

query='"route-information"[]."route-table"[].rt[].{\"destination\": \"rt-destination\", path: \"rt-entry\".\"as-path\".data}'

results in:

[{'destination': [{'data': '2001:db8:1::/48'}], 'path': None}, {'destination': [{'data': '2001:db8:2::/48'}], 'path': None}]

Any ideas why "None" and or/how to proceed?

Another idea was filtering:

query='"route-information"[]."route-table"[].rt[?"rt-destination".data==`2001:db8:2::/48`]'

and then working my way further down to get the as-path. But the query results in [[]] where as

query='"route-information"[]."route-table"[].rt[?"rt-destination".data=="2001:db8:2::/48"]'

leads to

[[{'rt-destination': [{'data': '2001:db8:1::/48'}], 'rt-entry': [{'as-path': [{'data': '64511 65551 I'}]}]}, {'rt-destination': [{'data': '2001:db8:2::/48'}], 'rt-entry': [{'as-path': [{'data': '65536 64496 I'}]}]}]]

so no filtering at all.


Solution

  • If your goal is to have a set of data looking like this:

    [
      {
        "destination":  "2001:db8:1::/48",
        "path": "64511 65551 I"
      },
      {
        "destination": "2001:db8:2::/48",
        "path": "65536 64496 I"
      }
    ]
    

    It can simply be achieved with this query:

    route-information[].route-table[].rt[].{
      destination: rt-destination[0].data, 
      path: rt-entry[0].as-path[0].data
    }
    

    But I suspect, with the lists you have under rt-destination, rt-entry and as-path, that this query might actually make you miss some data.

    Here is one that is less likely going to make you miss data but that is creating a list under path, so the resulting JSON looks like:

    [
      {
        "destination": [
          "2001:db8:1::/48"
        ],
        "path": [
          "64511 65551 I"
        ]
      },
      {
        "destination": [
          "2001:db8:2::/48"
        ],
        "path": [
          "65536 64496 I"
        ]
      }
    ]
    

    And the query is:

    route-information[].route-table[].rt[].{
      destination: rt-destination[*].data, 
      path: rt-entry[*].as-path[*].data | [] 
    }
    

    This query makes use of the flatten operator |, which is explained in the examples under working with nested data.


    Here is a script demonstrating this:

    import jmespath
    from pprint import pprint
    
    data = {
      "route-information": [
        {
          "route-table": [
            {
              "rt": [
                {
                  "rt-destination": [
                    {
                      "data": "2001:db8:1::/48"
                    }
                  ],
                  "rt-entry": [
                    {
                      "as-path": [
                        {
                          "data": "64511 65551 I"
                        }
                      ]
                    }
                  ]
                },
                {
                  "rt-destination": [
                    {
                      "data": "2001:db8:2::/48"
                    }
                  ],
                  "rt-entry": [
                    {
                      "as-path": [
                        {
                          "data": "65536 64496 I"
                        }
                      ]
                    }
                  ]
                }
              ]
            }
          ]
        }
      ]
    }
    
    query = [
        '''
        "route-information"[]."route-table"[].rt[].{
            destination: "rt-destination"[0].data,
            path: "rt-entry"[0]."as-path"[0].data
        }
        ''',
        '''
        "route-information"[]."route-table"[].rt[].{
            destination: "rt-destination"[*].data,
            path: "rt-entry"[*]."as-path"[*].data | []
        }
        '''
    ]
    
    pprint(jmespath.search(query[0], data))
    print('---------------------------')
    pprint(jmespath.search(query[1], data))
    

    Prints:

    [{'destination': '2001:db8:1::/48', 'path': '64511 65551 I'},
     {'destination': '2001:db8:2::/48', 'path': '65536 64496 I'}]
    ---------------------------
    [{'destination': ['2001:db8:1::/48'], 'path': ['64511 65551 I']},
     {'destination': ['2001:db8:2::/48'], 'path': ['65536 64496 I']}]