Search code examples
pythonamazon-web-servicesjmespath

Complex JMESPath filter on a large JSON file


Please consider the following JSON extract (the data is much larger but this is a shorter piece of it i'm trying to get to work)

jsonData = """{
  "products" : {
    "DQ578CGN99KG6ECF" : {
      "sku" : "DQ578CGN99KG6ECF",
      "productFamily" : "Compute",
      "attributes" : {
        "location" : "US East (N. Virginia)",
        "instanceType" : "hs1.8xlarge",
        "tenancy" : "Shared",
        "operatingSystem" : "Windows",
        "licenseModel" : "License Included",
        "preInstalledSw" : "NA"
      }
    },
    "G2N9F3PVUVK8ZTGP" : {
      "sku" : "G2N9F3PVUVK8ZTGP",
      "productFamily" : "Instance",
      "attributes" : {
        "location" : "Asia Pacific (Seoul)",
        "instanceType" : "i2.xlarge",
        "tenancy" : "Host",
        "operatingSystem" : "Windows",
        "licenseModel" : "License Included",
        "preInstalledSw" : "SQL Server Enterprise"
      }
    },
    "FBZZ2TKXWWY5HZRX" : {
      "sku" : "FBZZ2TKXWWY5HZRX",
      "productFamily" : "Compute",
      "attributes" : {
        "location" : "Asia Pacific (Seoul)",
        "instanceType" : "i2.4xlarge",
        "tenancy" : "Dedicated",
        "operatingSystem" : "SUSE",
        "licenseModel" : "No License required",
        "preInstalledSw" : "NA"
      }
    }
  }
}"""

I'm not able to create a proper filter to find say all products with "Windows" as operatingSystem and tenancy shared.

I got to this point:

priceJson = json.loads(jsonData)
query = "products.*.attributes[?operatingSystem=='Windows' && tenancy=='Shared']"
output_dict = jmespath.search(query, priceJson)

however i loose the sku # this way.

Result:

[{        
        "location" : "US East (N. Virginia)",
        "instanceType" : "hs1.8xlarge",
        "tenancy" : "Shared",
        "operatingSystem" : "Windows",
        "licenseModel" : "License Included",
        "preInstalledSw" : "NA"
}]

What i'd like to get:

[
  { "sku": "DQ578CGN99KG6ECF",
    "attributes" : {
        "location" : "US East (N. Virginia)",
        "instanceType" : "hs1.8xlarge",
        "tenancy" : "Shared",
        "operatingSystem" : "Windows",
        "licenseModel" : "License Included",
        "preInstalledSw" : "NA"
    }
}]

Any idea how to get to that result ?


Solution

  • Well I continued to look for an answer on this and i finally succeeded to get to my result !

    The key was to do this in two steps :)

    This is the code i use now:

    #!/usr/bin/env python
    try:
        # For Python 3.0 and later
        from urllib.request import urlopen
    except ImportError:
        # Fall back to Python 2's urllib2
        from urllib2 import urlopen
    
    import json, jmespath
    
    jsonData = """{
      "products" : {
        "DQ578CGN99KG6ECF" : {
          "sku" : "DQ578CGN99KG6ECF",
          "productFamily" : "Compute",
          "attributes" : {
            "location" : "US East (N. Virginia)",
            "instanceType" : "hs1.8xlarge",
            "tenancy" : "Shared",
            "operatingSystem" : "Windows",
            "licenseModel" : "License Included",
            "preInstalledSw" : "NA"
          }
        },
        "G2N9F3PVUVK8ZTGP" : {
          "sku" : "G2N9F3PVUVK8ZTGP",
          "productFamily" : "Instance",
          "attributes" : {
            "location" : "Asia Pacific (Seoul)",
            "instanceType" : "i2.xlarge",
            "tenancy" : "Host",
            "operatingSystem" : "Windows",
            "licenseModel" : "License Included",
            "preInstalledSw" : "SQL Server Enterprise"
          }
        },
        "FBZZ2TKXWWY5HZRX" : {
          "sku" : "FBZZ2TKXWWY5HZRX",
          "productFamily" : "Compute",
          "attributes" : {
            "location" : "Asia Pacific (Seoul)",
            "instanceType" : "i2.4xlarge",
            "tenancy" : "Dedicated",
            "operatingSystem" : "SUSE",
            "licenseModel" : "No License required",
            "preInstalledSw" : "NA"
          }
        }
      }
    }"""
    
    priceJson = json.loads(jsonData)
    
    query = "products.*.{sku: sku, location: attributes.location, instanceType: attributes.instanceType, tenancy: attributes.tenancy, operatingSystem: attributes.operatingSystem, licenseModel: attributes.licenseModel, preInstalledSw: attributes.preInstalledSw}"
    output_dict = jmespath.search(query, priceJson)
    
    query2 = "[?operatingSystem=='Windows' && tenancy=='Shared']"
    output_dict = jmespath.search(query2, output_dict)
    
    print(output_dict)
    

    and the result:

    [
      {
        "preInstalledSw": "NA",
        "location": "US East (N. Virginia)",
        "sku": "DQ578CGN99KG6ECF",
        "operatingSystem": "Windows",
        "tenancy": "Shared",
        "instanceType": "hs1.8xlarge",
        "licenseModel": "License Included"
      }
    ]