Search code examples
pythonsqlalchemy

Add column to existing SQL table usig sqlalcehmy


I have a sqlalchemy connection and an existing table

database_con = f'mssql://@{server}/{database}?driver={driver}'

engine = create_engine(database_con)
con = engine.connect()

tbl = '#temp'
df = pd.DataFrame({'col1':[1,2,3], 'col2':['a','b','c']})
df.to_sql(
            name=tbl,
            con=con,
            if_exists="append",
            index=False
        )

I have new data, with new columns, and want to append it to the existing SQL table. I read add column to SQLAlchemy Table stating that it is not possible, but that was 2011.

My workaround is to pd.concat the new df to the existing one, drop the existing table in SQL, then write the concatenated table in its place

df = pd.DataFrame({'column1':['test_20230925'], 'column2':[234],  'column3':[234.56]})
df_new = pd.concat([data,df])

drop_query = f""" drop table {tbl}"""
con.execute(text(drop_query))
con.commit()

df_new.to_sql(
            name=tbl,
            con=con,
            if_exists="append",
            index=False
        )

This is fine for small data, but I have some tables that are 10m rows, and it is horribly inefficient to read it in memory, drop, append, and write again to get the new columns in place.

Can anyone suggest if there is a more efficient method?

I also tried:

alter_query = f""" alter table {tbl} add column New_Col varchar(255)"""
con.execute(text(alter_query))
con.commit()

with the error:

sqlalchemy.exc.ProgrammingError: (pyodbc.ProgrammingError) ('42000', "[42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Incorrect syntax near the keyword 'column'. (156) (SQLExecDirectW)")
[SQL:  alter table #temp add column New_Col varchar(255)]

Solution

  • This error has nothing to do with SQLAlchemy, the column keyword is simply not necessary in this statement.

    Correct would be:

    alter table my_table add new_col varchar(255)
    

    And therefore:

    alter_query = f"""alter table {tbl} add column New_Col varchar(255)"""
    

    You won't be able to mark the new column as NOT NULL (unless you provide a default for the existing data).

    If the new data has no overlap with the existing content of the table (no duplicates), then you could just alter the table using the statement above and insert the new data. If there is overlap, you would need to query the table and check before inserting. Won't be efficient, but you probably won't do that often.