sql-server pandas sqlalchemy pyodbc pymssql

missing timezone offset when writing pandas dataframe to MSSQL datetimeoffset column

The timezone offset is set to +00:00 for timestamp entries when writing pandas dataframe to a table on MSSQL database, although the timestamps in the dataframe contain CE(S)T offsets +02:00.

I am creating a simple table with three columns of which one contains datetime timezone aware timestamps. The said column datatype was set to datetimeoffset(7) when the table was created on the SQL server.

name = ['price']*24
values = np.random.rand(24)
starttimes = pd.date_range('2019-09-10 00:00', '2019-09-10 23:00', freq='h', tz='CET')

df = pd.DataFrame({'start_time': starttimes, 'values': values, 'item': name})

engine = sqlalchemy.create_engine('mssql+pymssql://scott:tiger@hostname:port/dbname')

df.to_sql(
    name='test_table',
    con=engine,
    index=False,
    if_exists='append',
    dtype={'start_time': DateTime(timezone=True)}
)

When the dataframe is inserted onto the table in the database, the offset becomes +00:00 instead of +02:00. For example, the original dataframe has an entry 2019-09-10 00:00:00+02:00 which on the server becomes 2019-09-10 00:00:00.0000000 +00:00

The problem persists when I use pyodbc in place of pymssql for which i have a Microsoft ODBC Driver 17 installed.

Solution

I'm not sure about pymssql, but with pyodbc you can insert to a datetimeoffset column if you convert the DataFrame column to string using df.astype:

table_name = "test_table"
with engine.begin() as conn:
    conn.exec_driver_sql(f"DROP TABLE [{table_name}]")
    conn.exec_driver_sql(f"CREATE TABLE [{table_name}] (dto datetimeoffset(7))")

dto = datetime.datetime(2019, 9, 10, 0, 0, 0, tzinfo=datetime.timezone(datetime.timedelta(hours=2)))
df = pd.DataFrame([(dto,)], columns=['dto'])

# convert pandas Timestamp column to string
df = df.astype({'dto': numpy.str})

df.to_sql(table_name, engine, if_exists='append', index=False)

# check result
with engine.begin() as conn:
    print(conn.execute(text(f"SELECT CAST(dto AS VARCHAR(50)) AS dto_str FROM [{table_name}]")).fetchall())
    # [('2019-09-10 00:00:00.0000000 +02:00',)]