Search code examples
jsonansiblejmespath

JMESPath expression to flatten array of objects, each with nested arrays of objects


I have JSON containing an array of databases, each database has an array of users, e.g.

{"databases": [
  {"db": "db_a", "users": [{"name": "alice"}, {"name": "alex"}]},
  {"db": "db_b", "users": [{"name": "bob"}, {"name": "brienne"}]}
]}

I would like to produce a flat array of databases and users, i.e.

[
  {"db": "db_a", "name": "alice"},
  {"db": "db_a", "name": "alex"},
  {"db": "db_b", "name": "bob"},
  {"db": "db_b", "name": "brienne"}
]

In SQL terms this would be a cartesian join or cartesian product, but I'm not sure of the correct term in a tree structure. The closest I've got so far is

databases[].users[]

which produces

[{"name": "alice"}, {"name": "alex"}, {"name": "bob"}, {"name": "brienne"}]

and

databases[].{db: db, name: users[].name}

which produces

[
  {"db": "db_a", "name": ["alice", "alex"]},
  {"db": "db_b", "name": ["bob", "brienne"]}
]

Addendum: I'm happy to accept "You can't do that with JMESPath, here's why ..." as an answer. An HN Comment`` hints at this

can't reference parents when doing iteration. Why? All options for iteration, [* ] and map, all use the iterated item as the context for any expression. There's no opportunity to get any other values in


Solution

  • As of JEP 18 you can use a let expression, available in jmespath-community on PyPI.

    A let-expression introduces lexical scoping and lets you bind variables that are evaluated In the context of a given lexical scope. This enables queries that can refer to elements defined outside of their current element

    databases[*].let $db = db in users[*].{"db": $db, "name": name}
    

    This produces

    [
      [
        {"db": "db_a", "name": "alice"},
        {"db": "db_a", "name": "alex"}
      ],
      [
        {"db": "db_b", "name": "bob"},
        {"db": "db_b", "name": "brienne"}
      ]
    ]
    

    As pointed out by Alex Willmer, you can flatten this with

    (databases[*].let $db = db in users[(].{"db": $db, "name": name})[]
    

    producing

    [
      { "db": "db_a", "name": "alice" },
      { "db": "db_a", "name": "alex" },
      { "db": "db_b", "name": "bob" },
      { "db": "db_b", "name": "brienne" }
    ]