Python SqlAlchemy adding escape characters to json string while inserting in postgresql json column, so queries not working

I am adding JSON string to Postgres jsonb column in python using the sqlalchemy library.

Code to create json string is

survey_data_series = scto_df.loc[int(i)]

survey_data_json_string = json.dumps(survey_data_series.to_dict())

Code to add to postgresql database is:

def add_record(self, table_name, record_data):
    record = getattr(self, table_name)(**record_data)
    self.s.add(record)
    self.s.flush()
    return(record.id)

My JSON is printed properly in logs

{"completiondate": "Feb 8, 2022 10:49:29 AM", "submissiondate": "Feb 8, 2022 10:49:29 AM", "starttime": "Feb 8, 2022 10:37:17 AM", "endtime": "Feb 8, 2022 10:49:21 AM", "deviceid": "ac29d854b49a49be", "subscriberid": NaN}

But when it goes to database the json is enclosed in a string and an escape character is added in front of every double quote.

"{\"completiondate\": \"Feb 8, 2022 10:49:29 AM\", \"submissiondate\": \"Feb 8, 2022 10:49:29 AM\", \"starttime\": \"Feb 8, 2022 10:37:17 AM\", \"endtime\": \"Feb 8, 2022 10:49:21 AM\", \"deviceid\": \"ac29d854b49a49be\", \"subscriberid\": NaN}"

Due to which none of the below queries are working:

SELECT JSON_EXTRACT_PATH_TEXT(survey_data_json::json,'starttime') AS starttime
FROM "Suvita".survey_data_json;

SELECT survey_data_json -> 'starttime' AS starttime
FROM "Suvita".survey_data_json;

SELECT (survey_data_json -> 0)--> '\"starttime\"' AS starttime
FROM "Suvita".survey_data_json; --Gives the whole json string

SELECT * FROM "Suvita".survey_data_json where survey_data_json->>'deviceid'='ac29d854b49a49be';

My requirement is to be able to query JSON column by matching the key value. Please help

SOLVED: I edited the code to pass the dictionary directly to sqlalchemy table object to add the row. It automatically converted the dictionary to JSON.

record=self.survey_data_json(surveyid=surveyid,survey_data_json=dict_object)
self.s.add(record)

Thanks @snakecharmerb

Solution

By default, SQLAlchemy will automatically call json.dumps on values assigned to a JSON or JSONB column, so it isn't necessary to call it yourself - in fact this will lead to double-encoded values as seen in the question.

The functions used to encode and decode JSON can be managed by setting the json_serializer and json_deserializer arguments to create_engine

If you don't want SQLAlchemy to serialise automatically, pass an identity function like lambda x: x (passing None does not seem to affect the default behaviour).

import json

import sqlalchemy as sa
from sqlalchemy.dialects.postgresql import JSONB

engine = sa.create_engine('postgresql+psycopg2:///test', future=True)
engine2 = sa.create_engine(
    'postgresql+psycopg2:///test', future=True, json_serializer=lambda x: x
)

tbl = sa.Table(
        't73387703',
        sa.MetaData(),
        sa.Column('id', sa.Integer, primary_key=True),
        sa.Column('data', JSONB),
)
tbl.drop(engine, checkfirst=True)
tbl.create(engine)

d = {'a': 1, 'b': 2}

ins = tbl.insert()

with engine.begin() as conn:
    conn.execute(ins.values(data=json.dumps(d)))
    conn.execute(ins.values(data=d))

with engine2.begin() as conn:
    conn.execute(ins.values(data=json.dumps(d)))

test# select *  from t73387703 order by id;
 id │          data          
════╪════════════════════════
  1 │ "{\"a\": 1, \"b\": 2}"
  2 │ {"a": 1, "b": 2}
  3 │ {"a": 1, "b": 2}
(3 rows)