Search code examples
jsonsearchsubstringcontainsjmespath

Filtering JMESPath with a string


I can't find the solution despite a lot of research. I'm stuck with the contains function. I have this Json file:

    {
  "from": "Api",
  "success": true,
  "message": "",
  "errors": [],
  "data": {
    "operations": [
      {
        "IDOperation": 100,
        "DateEcriture": "2019-01-02",
        "Comment": "Invoice Nh5 numero 152",
        "sous_operations": []
      },
      {
        "IDOperation": 101,
        "DateEcriture": "2019-01-02",
        "Comment": "one other thing",
        "sous_operations": []
      },
      {
        "IDOperation":102,
        "DateEcriture": "2019-01-02",
        "Comment": "an other thing",
        "sous_operations": [{"ID-sous-Operation": 103,
                           "DateEcriture": "2019-01-02",
                           "Comment": "Invoice Nh15 numero 341"}]
      }]
   } 
}

And I want to filter the objects having the word "Invoice" in the "Comment" field to get this:

{"operations": [
      {
        "IDOperation": 100,
        "DateEcriture": "2019-01-02",
        "Comment": "Invoice Nh5 numero 152"
      },
      {
        "IDOperation": 103,
        "DateEcriture": "2019-01-02",
        "Comment": "Invoice Nh15 numero 341"
      }]
}

Thanks for your help


Solution

  • You haven't stated which part you're having difficulty with. I'm going to guess it's dealing with the nested sub-operations, since that seems the hardest and least obvious part to me, but I'll try to cover everything.

    Here are my assumptions:

    • Input always consists of an object with a field data.
    • data field is always an object with a field operations.
    • operations is always an array.
    • Every member of operations has the same four fields: IDOperation, DateEcriture, Comment, sous_operations.
    • sous_operations is always an array.
    • Every member of sous_operations has the same three fields: ID-sous-Operation (!), DateEcriture and Comment.
    • In particular, sub-operations are not nested more than one layer deep.
    • All fields called Comment are strings.
    • You want to find both the operations and sub-operations that have "Invoice" (case-insensitive) in their Comment field.
    • You want to output them, but not any sub-operations they might have.
    • You want to rename ID-sous-Operation to IDOperation.
    • Output should consist of an object containing a single field operations which is an array of the selected and transformed operations.

    I think this does what you want:

    {
      operations:
        data.operations|
        map(
          &[
            [
              {
                IDOperation:IDOperation,
                DateEcriture:DateEcriture,
                Comment:Comment            
              }
            ],
            map(
              &{
                IDOperation:"ID-sous-Operation",
                DateEcriture:DateEcriture,
                Comment:Comment
              },
              sous_operations
            )
          ],
          @
        )|
        [][]|
        [?contains(Comment,`"Invoice"`)]
    }
    

    First, we replace every operation with a two-member array. The first member is a single element array containing the operation's fields but not its sub-operations. The second member is an array of all the operation's sub-operations. (At this point we also rename sub-operations ID field.)

    So now we have operations as an array of (two element) arrays of arrays of simplified operations. We use the flatten operator twice to get a straightforward single-level array. Finally we filter it simply using the contains method.

    Here's the output:

    $ jp --filename input1.json --expr-file filter.jmespath
    {
      "operations": [
        {
          "Comment": "Invoice Nh5 numero 152",
          "DateEcriture": "2019-01-02",
          "IDOperation": 100
        },
        {
          "Comment": "Invoice Nh15 numero 341",
          "DateEcriture": "2019-01-02",
          "IDOperation": 103
        }
      ]
    }