Search code examples
jsonnestedjqdata-manipulation

Use JQ to parse JSON nested objects, using select to match key-value in nested object while showing existing structure


Use JQ to parse JSON nested objects, using select to match key-value in nested object while showing existing structure

I am trying to take a complex JSON file of 20,000+ lines and extract a specific key while retaining the surrounding metadata which adds necessary human-understandable context.


Data source (complex structure):

{
  "Marketplace": [
    {
      "Level1Name": "Company A Products",
      "Level1Array": [
        {
          "Level2Name": "USA Products List",
          "Level2Contents": [
            {
              "Level3Name": "ALL",
              "Level3URL": "https://a.com/products"
            },
            {
              "Level3Name": "Subset1001",
              "Level3URL": "https://a.com/products/subset1001"
            }
          ]
        }
      ]
    },
    {
      "Level1Name": "Company B Products",
      "Level1Array": [
        {
          "Level2Name": "USA Products List",
          "Level2Contents": [
            {
              "Level3Name": "ALL",
              "Level3URL": "https://b.com/products"
            },
            {
              "Level3Name": "Subset500",
              "Level3URL": "https://b.com/products/subset500"
            }
          ]
        },
        {
          "Level2Name": "EU Products List",
          "Level2Contents": [
            {
              "Level3Name": "ALL",
              "Level3URL": "https://b.eu/products"
            },
            {
              "Level3Name": "Subset200",
              "Level3URL": "https://b.eu/products/subset200"
            }
          ]
        }
      ]
    },
    {
      "Level1Name": "Company X Products",
      "Level1Array": [
        {
          "Level2Name": "Deleted Products",
          "Level2URL": "https://internal.x.com/products"
        }
      ]
    }
  ]
}

JQ command currently used to extract removes all other contextual metadata...

jq -r '(
         .Marketplace[].Level1Array[].Level2Contents[]
         | select (.Level3Name | index("ALL"))
         | [.]
         )'

Output given...

[
  {
    "Level3Name": "ALL",
    "Level3URL": "https://a.com/products"
  }
]
[
  {
    "Level3Name": "ALL",
    "Level3URL": "https://b.com/products"
  }
]
[
  {
    "Level3Name": "ALL",
    "Level3URL": "https://b.eu/products"
  }
]

Option 1 output desired, same JSON structure with removing all other objects which does not match select filter "ALL" string criteria

{
    "Marketplace":
  [
        {
            "Level1Name": "Company A Products",
            "Level1Array": [
                {
                    "Level2Name": "USA Products List",
                    "Level2Contents": [
                        {
                            "Level3Name": "ALL",
                            "Level3URL": "https://a.com/products"
                        }
                    ]
                }
            ]
        },
        {
            "Level1Name": "Company B Products",
            "Level1Array": [
                {
                    "Level2Name": "USA Products List",
                    "Level2Contents": [
                        {
                            "Level3Name": "ALL",
                            "Level3URL": "https://b.com/products"
                        }
                    ]
                },
                {
                    "Level2Name": "EU Products List",
                    "Level2Contents": [
                        {
                            "Level3Name": "ALL",
                            "Level3URL": "https://b.eu/products"
                        }
                    ]
                }
            ]
        }
    ]
}

Option 2 output desired, any similar format which can be iterated with loop such as:

{
  "Marketplace":
  [
    {
      "Level1Name": "Company A Products",
      "Level2Name": "USA Products List",
      "Level3Name": "ALL",
      "Level3URL": "https://a.com/products"
    },
    {
      "Level1Name": "Company B Products",
      "Level2Name": "USA Products List",
      "Level3Name": "ALL",
      "Level3URL": "https://b.com/products"
    },
    {
      "Level1Name": "Company B Products",
      "Level2Name": "EU Products List",
      "Level3Name": "ALL",
      "Level3URL": "https://b.eu/products"
    }
  ]
}

Solution

  • The following filter produces the "Option 2" output:

    .Marketplace |= map(
      {Level1Name} as $Level1Name
      | .Level1Array[]
      | {Level2Name} as $Level2Name
      | .Level2Contents[]?
      | select(.Level3Name == "ALL")
      | $Level1Name + $Level2Name + . )
    
    

    Breaking it down ...

    One way to understand this is to consider:

    .Marketplace[]
    | {Level1Name} as $Level1Name
    | .Level1Array[]
    | {Level2Name} as $Level2Name
    | .Level2Contents[]?             # in case .Level2Contents is missing
    | if (.Level3Name == "ALL")
      then $Level1Name + $Level2Name + .
      else empty
      end
    

    Addendum: "Name"

    The OP subsequently asked what can be done if the three levels of "Name" keys are all named "Name". An answer can easily be obtained by tweaking the above, to yield:

    .Marketplace |= map(
      {Level1Name: .Name} as $Level1Name
      | .Level1Array[]
      | {Level2Name: .Name} as $Level2Name
      | .Level2Contents[]?
      | select(.Name == "ALL")
      | $Level1Name + $Level2Name + . )
    

    Output

    In this case, the output would be as follows:

    {
      "Marketplace": [
        {
          "Level1Name": "Company A Products",
          "Level2Name": "USA Products List",
          "Name": "ALL",
          "Level3URL": "https://a.com/products"
        },
        {
          "Level1Name": "Company B Products",
          "Level2Name": "USA Products List",
          "Name": "ALL",
          "Level3URL": "https://b.com/products"
        },
        {
          "Level1Name": "Company B Products",
          "Level2Name": "EU Products List",
          "Name": "ALL",
          "Level3URL": "https://b.eu/products"
        }
      ]
    }