Search code examples
pythonpostgresqlsqlalchemyfastapi

How to query a JSONB column that has deeply nested objects in Python/FastAPI, SqlAlchemy, and PostgreSQL


In a PostgreSQL table "private_notion", I have a JSONB column "record_map" that may or may not contain nested objects, E.g.

{
  "blocks": {
    "7a9abf0d-a066-4466-a565-4e6d7a960a37": {
      "name": "block1",
      "value": 1,
      "child": {
        "7a9abf0d-a066-4466-a565-4e6d7a960a37": {
          "name": "block2",
          "value": 2,
          "child": {
            "7a9abf0d-a066-4466-a565-4e6d7a960a37": {
              "name": "block3",
              "value": 3
            }
          }
        },
        "7a9abf0d-a066-4466-a565-4e6d7a960a38": {
          "name": "block4",
          "value": 4,
          "child": {
            "7a9abf0d-a066-4466-a565-4e6d7a960a39": {
              "name": "block5",
              "value": 5,
              "child": {
                "7a9abf0d-a066-4466-a565-4e6d7a960a40": {
                  "name": "block6",
                  "value": 6
                }
              }
            }
          }
        },
      }
    }
  }
}

To retrieve data, We don't know which block has the data we want, we only have the key. Let's assume we are looking for the object with this key "7a9abf0d-a066-4466-a565-4e6d7a960a40", but we don't know that it is located in child block6 of parent block4 and block5. Another request might look for the parent block4 and so on, and I must find the block by it's key.

The entire code looks like this;

async def get_private_notion_page(
        site_uuid: str, page_id: str, db_session: AsyncSession
    ) -> PrivateNotionPage:
       
        page_id_path = f"{page_id}" # page_id looks like this 7a9abf0d-a066-4466-a565-4e6d7a960a37
        path = f"$.** ? (@.{page_id_path})"

        stmt = text(
            f"""
            SELECT jsonb_path_query(record_map, {path})
            FROM private_notion
            WHERE site_id = {site_uuid}
            """
        )

        result = await db_session.execute(stmt)
        result = result.scalars().first()

        if result:
            return result
        else:
            raise PrivateNotionSiteWasNotFound

So I came up with the following query statements which use sqlalchemy "text" method to accept raw SQL query, but jsonb_path_query_array and jsonb_path_query throw similar errors; syntax error at or near "$".

page_id_path = f"{page_id}"
path = f"$.** ? (@.{page_id_path})"
stmt = text(
            f"""
            SELECT jsonb_path_query(record_map, {path})
            FROM private_notion
            WHERE site_id = {site_uuid}
            """
        )

Error:

sqlalchemy.exc.ProgrammingError: (sqlalchemy.dialects.postgresql.asyncpg.ProgrammingError) <class 'asyncpg.exceptions.PostgresSyntaxError'>: syntax error at or near "$"
[SQL: 
             SELECT jsonb_path_query(record_map, $.** ? (@.7a9abf0d-a066-4466-a565-4e6d7a960a37))
             FROM private_notion
             WHERE site_id = 26f52d8e-a380-46ab-9131-e6f7f62c528f
]

I would later learn that "The $** operator is not valid in a SQL query. Instead, you can use the jsonb_path_query_array function to search recursively through all levels of the JSONB object."

Apparently I got the same error after refactoring the code.

page_id_path = f"{page_id}"
path = f"$[*] ? (@ like_regex {page_id_path})"
stmt = text(
            f"""
            SELECT jsonb_path_query_array(record_map -> 'block', {path})
            FROM private_notion
            WHERE site_id = {site_uuid}
            """
        )

Error:

sqlalchemy.exc.ProgrammingError: (sqlalchemy.dialects.postgresql.asyncpg.ProgrammingError) <class 'asyncpg.exceptions.PostgresSyntaxError'>: syntax error at or near "$"
[SQL: 
     SELECT jsonb_path_query_array(record_map -> 'block', $[*] ? (@ like_regex 7a9abf0d-a066-4466-a565-4e6d7a960a37))
     FROM private_notion
     WHERE site_id = 26f52d8e-a380-46ab-9131-e6f7f62c528f
]

My question is two-pronged, what is the error all about? And is there a better way to retrieve a nested object by key in a JSONB column? Thank you for your time.


Solution

  • This extracts entire objects at any level that have your target uuid-based key in them: demo at db<>fiddle

    SELECT jsonb_path_query(record_map, 
                            'strict $.**?(@.keyvalue().key==$target_id)',
                            jsonb_build_object('target_id',
                                               '7a9abf0d-a066-4466-a565-4e6d7a960a37'))
    FROM private_notion
    WHERE site_id = '45bf37be-ca0a-45eb-838b-015c7a89d47b';
    
    jsonb_path_query
    {
        "7a9abf0d-a066-4466-a565-4e6d7a960a37": {
            "name": "block1",
            "child": {
                "7a9abf0d-a066-4466-a565-4e6d7a960a37": {
                    "name": "block2",
                    "child": {
                        "7a9abf0d-a066-4466-a565-4e6d7a960a37": {
                            "name": "block3",
                            "value": 3
                        }
                    },
                    "value": 2
                },
                "7a9abf0d-a066-4466-a565-4e6d7a960a38": {
                    "name": "block4",
                    "child": {
                        "7a9abf0d-a066-4466-a565-4e6d7a960a39": {
                            "name": "block5",
                            "child": {
                                "7a9abf0d-a066-4466-a565-4e6d7a960a40": {
                                    "name": "block6",
                                    "value": 6
                                }
                            },
                            "value": 5
                        }
                    },
                    "value": 4
                }
            },
            "value": 1
        }
    }
    {
        "7a9abf0d-a066-4466-a565-4e6d7a960a37": {
            "name": "block2",
            "child": {
                "7a9abf0d-a066-4466-a565-4e6d7a960a37": {
                    "name": "block3",
                    "value": 3
                }
            },
            "value": 2
        },
        "7a9abf0d-a066-4466-a565-4e6d7a960a38": {
            "name": "block4",
            "child": {
                "7a9abf0d-a066-4466-a565-4e6d7a960a39": {
                    "name": "block5",
                    "child": {
                        "7a9abf0d-a066-4466-a565-4e6d7a960a40": {
                            "name": "block6",
                            "value": 6
                        }
                    },
                    "value": 5
                }
            },
            "value": 4
        }
    }
    {
        "7a9abf0d-a066-4466-a565-4e6d7a960a37": {
            "name": "block3",
            "value": 3
        }
    }

    Note the object duplication through unnesting: they appear both alone as well as inside each matched parent structure.

    1. JSONPath expression needs to be single-quoted. This gets rid of the syntax error:

      ERROR:  syntax error at or near "$"
      LINE 2:                         $.**.7a9abf0d-a066-4466-a565-4e6d7a9...
                                      ^
      
    2. Your uuid-based key inside the JSONPath needs to be double-quoted. This will get rid of a problem inside the expression that would soon follow:

      ERROR:  trailing junk after numeric literal at or near ".7a" of jsonpath input
      LINE 2:                         '$.**.7a9abf0d-a066-4466-a565-4e6d7a...
                                      ^
      
    3. When using .** accessor, default to using strict mode.

    4. You can use the SQLAlchemy JSONPath type to pass the expression.