I have a sqlalchemy connection and an existing table
database_con = f'mssql://@{server}/{database}?driver={driver}'
engine = create_engine(database_con)
con = engine.connect()
tbl = '#temp'
df = pd.DataFrame({'col1':[1,2,3], 'col2':['a','b','c']})
df.to_sql(
name=tbl,
con=con,
if_exists="append",
index=False
)
I have new data, with new columns, and want to append it to the existing SQL table. I read add column to SQLAlchemy Table stating that it is not possible, but that was 2011.
My workaround is to pd.concat
the new df to the existing one, drop the existing table in SQL, then write the concatenated table in its place
df = pd.DataFrame({'column1':['test_20230925'], 'column2':[234], 'column3':[234.56]})
df_new = pd.concat([data,df])
drop_query = f""" drop table {tbl}"""
con.execute(text(drop_query))
con.commit()
df_new.to_sql(
name=tbl,
con=con,
if_exists="append",
index=False
)
This is fine for small data, but I have some tables that are 10m rows, and it is horribly inefficient to read it in memory, drop, append, and write again to get the new columns in place.
Can anyone suggest if there is a more efficient method?
I also tried:
alter_query = f""" alter table {tbl} add column New_Col varchar(255)"""
con.execute(text(alter_query))
con.commit()
with the error:
sqlalchemy.exc.ProgrammingError: (pyodbc.ProgrammingError) ('42000', "[42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Incorrect syntax near the keyword 'column'. (156) (SQLExecDirectW)")
[SQL: alter table #temp add column New_Col varchar(255)]
This error has nothing to do with SQLAlchemy, the column
keyword is simply not necessary in this statement.
Correct would be:
alter table my_table add new_col varchar(255)
And therefore:
alter_query = f"""alter table {tbl} add column New_Col varchar(255)"""
You won't be able to mark the new column as NOT NULL
(unless you provide a default for the existing data).
If the new data has no overlap with the existing content of the table (no duplicates), then you could just alter the table using the statement above and insert the new data. If there is overlap, you would need to query the table and check before inserting. Won't be efficient, but you probably won't do that often.