Search code examples
pythonsqljsonpostgresqlsqlalchemy

Python SqlAlchemy adding escape characters to json string while inserting in postgresql json column, so queries not working


I am adding JSON string to Postgres jsonb column in python using the sqlalchemy library.

Code to create json string is

survey_data_series = scto_df.loc[int(i)]

survey_data_json_string = json.dumps(survey_data_series.to_dict())

Code to add to postgresql database is:

def add_record(self, table_name, record_data):
    record = getattr(self, table_name)(**record_data)
    self.s.add(record)
    self.s.flush()
    return(record.id) 

My JSON is printed properly in logs

{"completiondate": "Feb 8, 2022 10:49:29 AM", "submissiondate": "Feb 8, 2022 10:49:29 AM", "starttime": "Feb 8, 2022 10:37:17 AM", "endtime": "Feb 8, 2022 10:49:21 AM", "deviceid": "ac29d854b49a49be", "subscriberid": NaN}

But when it goes to database the json is enclosed in a string and an escape character is added in front of every double quote.

"{\"completiondate\": \"Feb 8, 2022 10:49:29 AM\", \"submissiondate\": \"Feb 8, 2022 10:49:29 AM\", \"starttime\": \"Feb 8, 2022 10:37:17 AM\", \"endtime\": \"Feb 8, 2022 10:49:21 AM\", \"deviceid\": \"ac29d854b49a49be\", \"subscriberid\": NaN}"

Due to which none of the below queries are working:

SELECT JSON_EXTRACT_PATH_TEXT(survey_data_json::json,'starttime') AS starttime
FROM "Suvita".survey_data_json;

SELECT survey_data_json -> 'starttime' AS starttime
FROM "Suvita".survey_data_json;

SELECT (survey_data_json -> 0)--> '\"starttime\"' AS starttime
FROM "Suvita".survey_data_json; --Gives the whole json string

SELECT * FROM "Suvita".survey_data_json where survey_data_json->>'deviceid'='ac29d854b49a49be';

My requirement is to be able to query JSON column by matching the key value. Please help

SOLVED: I edited the code to pass the dictionary directly to sqlalchemy table object to add the row. It automatically converted the dictionary to JSON.

record=self.survey_data_json(surveyid=surveyid,survey_data_json=dict_object)
self.s.add(record)

Thanks @snakecharmerb


Solution

  • By default, SQLAlchemy will automatically call json.dumps on values assigned to a JSON or JSONB column, so it isn't necessary to call it yourself - in fact this will lead to double-encoded values as seen in the question.

    The functions used to encode and decode JSON can be managed by setting the json_serializer and json_deserializer arguments to create_engine

    If you don't want SQLAlchemy to serialise automatically, pass an identity function like lambda x: x (passing None does not seem to affect the default behaviour).

    import json
    
    import sqlalchemy as sa
    from sqlalchemy.dialects.postgresql import JSONB
    
    engine = sa.create_engine('postgresql+psycopg2:///test', future=True)
    engine2 = sa.create_engine(
        'postgresql+psycopg2:///test', future=True, json_serializer=lambda x: x
    )
    
    tbl = sa.Table(
            't73387703',
            sa.MetaData(),
            sa.Column('id', sa.Integer, primary_key=True),
            sa.Column('data', JSONB),
    )
    tbl.drop(engine, checkfirst=True)
    tbl.create(engine)
    
    d = {'a': 1, 'b': 2}
    
    ins = tbl.insert()
    
    with engine.begin() as conn:
        conn.execute(ins.values(data=json.dumps(d)))
        conn.execute(ins.values(data=d))
    
    with engine2.begin() as conn:
        conn.execute(ins.values(data=json.dumps(d)))
    
    test# select *  from t73387703 order by id;
     id │          data          
    ════╪════════════════════════
      1 │ "{\"a\": 1, \"b\": 2}"
      2 │ {"a": 1, "b": 2}
      3 │ {"a": 1, "b": 2}
    (3 rows)