Search code examples
jsonpostgresqlsemanticsjsonpath

Postgres 11 Jsonpath support


I am evaluating whether Postgres would be suited to parsing a number of quite complicated JSON documents to extract and semantically match entities, eventually filling a relational schema with high integrity.

I have found the use of jsonpath to be very helpful working with these documents and found this article which suggested that Postgres 11 would have support of sorts. However, the docs do not mention this at all

My question is then will support be forthcoming? Also, is this kind of processing suited to Postgres at all? (possibly use Lucene, MongoDb for the parsing and matching then importing into Postgres relational tables somehow?)

An example of the data could be:

```

{
    "event_classes": [
        {
            "name": "American Football", 
            "url": "/sportsapi/v2/american-football", 
            "id": 27
        }, 
        {
            "name": "Athletics", 
            "url": "/sportsapi/v2/athletics", 
            "id": 48
        }, 
        {
            "name": "Aussie Rules", 
            "url": "/sportsapi/v2/aussie-rules", 
            "id": 10000062
        }, 
        {
            "name": "Badminton", 
            "url": "/sportsapi/v2/badminton", 
            "id": 10000069
        }, 
        {
            "name": "Baseball", 
            "url": "/sportsapi/v2/baseball", 
            "id": 5000026
        }
    ]
}

```


Solution

  • As a general point on SQL/Path will be a more terse method to query a JSONB data-structure. It will compile down to traditional methods of querying JSONB. That makes it a parser feature, providing a standard syntax.

    Imho, that standard syntax is substantially better and gives room for future optimizations, however any query on JSON can be done with the PostgreSQL operators you've linked to, it's just not always pretty.

    Finding out if that array contains {"foo":2} is simple.

    WITH t(jsonb) AS ( VALUES ('[{"foo":2, "qux":42},{"bar":2},{"baz":4}]'::jsonb) )
    SELECT *
    FROM t
    WHERE jsonb @> '[{"foo":2}]';
    

    However, it's substantially harder to do get the value of qux given the above.

    WITH t(jsonb) AS ( VALUES ('[{"foo":2, "qux":42},{"bar":2},{"baz":4}]'::jsonb) )
    SELECT e->'qux'
    FROM t
    CROSS JOIN LATERAL jsonb_array_elements(jsonb) AS a(e)
    WHERE t.jsonb @> '[{"foo":2}]'
      AND e @> '{"foo":2}';
    

    But, that's not the end of the world. That's actually a really nice SQL syntax. It's just not JavaScript. With JSON PATH you'll be able to do something,

    SELECT t.json _ '$.[@.foo == 2].qux'
    FROM t
    WHERE t.json _ '$.[@.foo == 2]';
    

    Where _ is some kind of JSONPATH operator. As an aside, you can always create an actual JavaScript stored procedure on the server and run it with node. It's really dirt simple with pl/v8.