Search code examples
pythonpostgresqlsqlalchemytimestamp

How does SQLAlchemy store datetime objects?


I have a PostgreSQL database that used to store timestamps without a timezone (TIMESTAMP WITHOUT TIMEZONE type). The ORM definition of the table looked something like this:

from sqlalchemy import DateTime
from sqlalchemy.orm import mapped_column

class TimeStampTable(db.Model):

    __tablename__ = 'timestamp'

    id = mapped_column(db.Integer, primary_key=True)
    # According to documentation DateTime is converted to naive timestamp 
    example_timestamp = mapped_column(DateTime, nullable=False)

I wanted to change the type of the column to TIMESTAMP WITH TIMEZONE to keep timezone information that was sent by my frontend.

The code for inserting the timestamp sent from the frontend looks something like this:

@app.route('/', methods=['GET'])
def index():
    # Normally fetched from the request but the value comes as string WITH offset
    dt = datetime.strptime('2023-04-10T17:32:55+0200', '%Y-%m-%dT%H:%M:%S%z')
    new_entry = TimeStampTable(example_timestamp=dt)

    db.session.add(new_entry)
    db.session.commit()

    return 'SUCCESS'

After changing my model it looked like this:

from sqlalchemy.dialects.postgresql import TIMESTAMP
from sqlalchemy.orm import mapped_column

class TimeStampTable(db.Model):

    __tablename__ = 'timestamp'

    id = mapped_column(db.Integer, primary_key=True)
    example_timestamp = mapped_column(TIMESTAMP(timezone=True), nullable=False)

Which was giving me the TIMESTAMP WITH TIMEZONE type in my database. Up to this point everything seemed fine. But after investigating on the stored values in the database I became a bit confused.

In my mind (as outlined in this Answer) I thought Postgres would take the DateTime and just strip the offset. So in the example the value inserted to the database would be 2023-04-10 17:32:55.

But it seems that SQLAlchemy is applying the offset first and then inserts the datetime object. So in this case if the database uses UTC the resulting database entry is 2023-04-10 15:32:55.

After migrating to TIMESTAMP WITH TIMEZONE the process of inserting the values to the database remained exactly the same. SQLAlchemy takes the datetime, applies the offset and saves the datetime like 2023-04-10 15:32:55+00.

My question now is: What is the whole purpose of storing a value as datetime with timezone apart from getting a datetime with tzinfo out of the database if the functionality is exactly the same anyway?


Solution

  • The behaviour you describe is not a function of SQLAlchemy per se, it is the behaviour of the DBAPI layer. We can reproduce the issue using just psycopg2:

    import datetime
    from pprint import pprint
    
    import psycopg2
    
    cnxn = psycopg2.connect(
        host="192.168.0.199", user="scott", password="tiger", dbname="test"
    )
    crsr = cnxn.cursor()
    
    tablename = "timestamp"
    
    crsr.execute(f"DROP TABLE IF EXISTS {tablename}")
    crsr.execute(
        f"""\
    CREATE TABLE {tablename} (
    id int primary key,
    description varchar(50),
    ts_no_tz timestamp,
    ts_with_tz timestamptz
    )
    """
    )
    
    midnight_toronto = datetime.datetime(
        2001, 1, 1, 0, 0, 0, tzinfo=datetime.timezone(datetime.timedelta(hours=-5))
    )
    
    crsr.execute(
        f"INSERT INTO {tablename} (id, description, ts_no_tz, ts_with_tz) "
        "VALUES (%(id)s, %(description)s, %(ts_value)s, %(ts_value)s)",
        dict(id=1, description="aware_datetime", ts_value=midnight_toronto),
    )
    crsr.execute(f"SELECT * FROM {tablename} WHERE id = 1")
    pprint(crsr.fetchone())
    """
    (1,
     'aware_datetime',
     datetime.datetime(2001, 1, 1, 5, 0),
     datetime.datetime(2001, 1, 1, 5, 0, tzinfo=datetime.timezone.utc))
    """
    

    We see that psycopg2 has recognized the tz-aware datetime and adjusted the value stored into ts_no_tz by 5 hours. Both values returned have hour=5.

    However, if we pass the datetime values as strings

    midnight_toronto_str = str(midnight_toronto)
    print(midnight_toronto_str)  # 2001-01-01 00:00:00-05:00
    crsr.execute(
        f"INSERT INTO {tablename} (id, description, ts_no_tz, ts_with_tz) "
        "VALUES (%(id)s, %(description)s, %(ts_value)s, %(ts_value)s)",
        dict(id=2, description="aware_string", ts_value=midnight_toronto_str),
    )
    crsr.execute(f"SELECT * FROM {tablename} WHERE id = 2")
    pprint(crsr.fetchone())
    """
    (2,
     'aware_string',
     datetime.datetime(2001, 1, 1, 0, 0),
     datetime.datetime(2001, 1, 1, 5, 0, tzinfo=datetime.timezone.utc))
    """
    

    the offset -05:00 is simply ignored by the server for ts_no_tz and it has hour=0, not hour=5.