Search code examples
pythonpandasamazon-web-servicespython-polarsduckdb

Polars read AWS RDS DB with a table containing column of type jsonb


I'm trying to read AWS RDS DB using the following method through polars:

df_rds_table_test = pl.read_database_uri(sql_query, uri)

Postgres DB contains a table with column name 'json_message' of type jsonb and other columns of type String. When the table is read by polars, it treats data type of json_message column as String.

Further, i'm using DuckDB on the polars dataframe to perform SQL operations.

SQL_QUERY_SRC_JOIN = "select id,json_message.amount as amount from df_rds_table_test where id=10020240501"

src_df = duckdb.sql(SQL_QUERY_SRC_JOIN).pl()

I'm getting an exception that states

duckdb.duckdb.BinderException: Binder Error: Cannot extract field 'amount' from expression "json_message" because it is not a struct, union, or json

Not sure if there's a way that we can cast the datatype to jsonb in polars as its not supported. I tried casting json_message to struct but i'm getting error.

Also i tried casting json_message to type JSON in duckdb query which didn't help neither.

Sample json_message:

{
  "amount": 0,
  "c_id": null,
  "branch": "0502",
  "user_id": "U999999"}

Great if someone could please help me to access the json string in the polars dataframe using duckdb.


Solution

  • Are you sure that casting to json doesn't work? It works in this example:

    tbl = duckdb.sql("""
    from (values
       ('{"amount": 0,"c_id": null, "branch": "0502", "user_id": "U999999"}')
    )
    select col0 as json_message
    """)
    
    ┌────────────────────────────────────────────────────────────────────┐
    │                            json_message                            │
    │                              varchar                               │
    ├────────────────────────────────────────────────────────────────────┤
    │ {"amount": 0,"c_id": null, "branch": "0502", "user_id": "U999999"} │
    └────────────────────────────────────────────────────────────────────┘
    
    duckdb.sql("""
        select (json_message::json).amount as amount from tbl
    """)
    
    ┌────────┐
    │ amount │
    │  json  │
    ├────────┤
    │ 0      │
    └────────┘
    

    or with polars:

    df = tbl.pl()
    (
        df
        .with_columns(pl.col("json_message").str.json_decode())
        .select(pl.col("json_message").struct.field("amount"))
    )
    
    ┌────────┐
    │ amount │
    │ ---    │
    │ i64    │
    ╞════════╡
    │ 0      │
    └────────┘