Search code examples
pythonpandasjmespath

Filter list based on another query result with JMESPath


Having an object such as the one below:

{
  "pick": "a",
  "elements": [
    {"id": "a", "label": "First"},
    {"id": "b", "label": "Second"}
  ]
}

how can I retrieve the item in the elements list where id is equal to the value of pick?

I was trying something like:

elements[?id == pick]

But, apparently, the expression at the right of the comparator is evaluated relative to the object being tested against my filter expression.

How can I achieve what I want? If this is not possible out of the box, do you have any suggestion of where I should start extending JMESPath? Thank you!


Solution

  • Unfortunately, JMESPath does not allow to reference the parent element.

    To circumvent this limitation, in this simple case, you can:

    • read the pick attribute in the first query,
    • create the second query using the value just read,
    • read the wanted content in the second query.

    Actually, thanks to f-strings, two last steps can be performed in a single instruction, so the code can be:

    pck = jmespath.search('pick', dct)
    jmespath.search(f'elements[?id == `{pck}`]', dct)
    

    where dct is the source JSON object.

    A more complex case

    If you have a more complex case (e.g. many such elements, with different pick values in each case), you should use another tool.

    One quite interesting option is to make use of Pandas package.

    Assume that your source dictionary contains:

    dct = {
      "x1": {
        "pick": "a",
        "elements": [
          {"id": "a",    "label": "First_a"},
          {"id": "b",    "label": "Second_a"},
          {"id": "c",    "label": "Third_a"}
        ]
      },
      "x2": {
        "pick": "b",
        "elements": [
          {"id": "a",    "label": "First_b"},
          {"id": "b",    "label": "Second_b"},
          {"id": "c",    "label": "Third_b"}
        ]
      }
    }
    

    The first thing to do is to convert dct into a Pandas DataFrame:

    import pandas as pd
    df = pd.DataFrame.from_dict(dct, orient='index')
    

    The result (printed in a "shortened" form) is:

       pick                                           elements
    x1    a  [{'id': 'a', 'label': 'First_a'}, {'id': 'b', ...
    x2    b  [{'id': 'a', 'label': 'First_b'}, {'id': 'b', ...
    

    Description (if you have no experience in Pandas):

    • x1, x2, ... - the index column - values taken from first level keys in dct.
    • pick - column with (no surprise) pick elements,
    • elements - column with elements (for now each cell contains the whole list).

    This shape is not very much useful, so let's explode elements column:

    df = df.explode('elements')
    

    Now df contains:

       pick                          elements
    x1    a   {'id': 'a', 'label': 'First_a'}
    x1    a  {'id': 'b', 'label': 'Second_a'}
    x1    a   {'id': 'c', 'label': 'Third_a'}
    x2    b   {'id': 'a', 'label': 'First_b'}
    x2    b  {'id': 'b', 'label': 'Second_b'}
    x2    b   {'id': 'c', 'label': 'Third_b'}
    

    This shape is closer to what we need: Each source row has been broken into a couple of rows, each with separate item from the initial list.

    There is one more thing to do, i.e. create a column containing id values, to be later compared with pick column. To do it run:

    df['id'] = df.elements.apply(lambda dct: dct['id'])
    

    Now df contains:

       pick                          elements id
    x1    a   {'id': 'a', 'label': 'First_a'}  a
    x1    a  {'id': 'b', 'label': 'Second_a'}  b
    x1    a   {'id': 'c', 'label': 'Third_a'}  c
    x2    b   {'id': 'a', 'label': 'First_b'}  a
    x2    b  {'id': 'b', 'label': 'Second_b'}  b
    x2    b   {'id': 'c', 'label': 'Third_b'}  c
    

    And to get the final result you should:

    • select rows with pick column == id,
    • take only elements column (together with key column, but this detail Pandas gives you just out of the box).

    The code to do it is:

    df.query('pick == id').elements
    

    giving:

    x1     {'id': 'a', 'label': 'First_a'}
    x2    {'id': 'b', 'label': 'Second_b'}
    

    In the Pandas parlance it is a Series (let's say a list with each element "labelled" with an index.

    Now you can covert it to a dictionary or whatever you wish.