Search code examples
javascriptjsonmongodbjsonpathjmespath

mongodb / jsonpath / jmespath expression to filter object with properties that contain special or accented characters


I need to filter the objects that contain special characters in this json. (I can either write a javascript or import and query the data in mongo. )

{
    "Detail": [{
            "name": "somename1",
            "text": "Sometext1"
        },
        {
            "name": "somename2",
            "text": "Sometext!"
        }, {
            "name": "somename3",
            "text": "método"
        }
    ]
}

Expected output

{
    "Detail": [
        {
            "name": "somename2",
            "text": "Sometext!"
        },
        {
            "name": "somename3",
            "text": "método"
        }
    ]
}

Is there a way to use regular expression in jsonpath or jmspath to do this?

I tried various approached to get closer to what I need such as these, but I'm blocked at this point

Detail[].text.contains(`é`) in jmespath

$.[?(@.text=~ ^[a-zA-Z0-9]*$].text in jsonpath

db.test.find({'Detail.text': /[a-zA-Z0-9]*$]/}) in mongodb where 'test' is the collection


Solution

  • I think you should try this tool Jayway JsonPath https://github.com/json-path/JsonPath

    I tried it online here https://jsonpath.herokuapp.com/

    In this picture you can see the output.

    enter image description here

    This is the regular expression I used

    .Detail[?(@.text =~ /(?:^\W+\w*$)|(?:^\w+\W+$)|(?:^\w+\W+\w+$)/)]
    

    I have divided the problem within three groups

    1. The string starts with at least one special characters or not word characters ^\W+ and it is followed by zero or more word characters i.e. $éxito

    2. The string starts with at least one word character ^\w+ and it is followed by zero or more not word characters \w* i.e. culminó, knowledge.

    3. The string starts with at least one word character ^\w+ then it is followed by one or more not word characters \W+ and then at least one word character i.e. beneplácito, now|knowledge, what$$$$happen.

    Each previous case is grouped using a non-capturing group (?:) and the logical operation used is OR (|) because your string could match any of the three groups.

    I also used this tool https://regexr.com/ to prepare the regular expression

    By the way, this is based on this answer JsonPath expression to filter using regex