i want to put a Panda Dataframe to a IBM i Series / AS400. I already researched a much, but now I am stuck. I already made a lot of queries, where I use pyodbc. For df.to_sql() I should use, as readed on other stacks, sqlalchemy with the ibm_db_sa dialect.
My actual code is:
CONNECTION_STRING = (
"driver={iSeries Access ODBC Driver};"
"System=111.111.111.111;"
"database=TESTDB;"
"uid=USER;"
"pwd=PASSW;"
)
quoted = urllib.parse.quote_plus(CONNECTION_STRING)
engine = create_engine('ibm_db_sa+pyodbc:///?odbc_connect={}'.format(quoted))
create_statement = df.to_sql("TABLETEST", engine, if_exists="append")
the following packages are installed
python 3.9
ibm-db 3.1.3
ibm-db-sa 0.3.7
ibm-db-sa-py3 0.3.1.post1
pandas 1.3.5
pip 22.0.4
setuptools 57.0.0
SQLAlchemy 1.4.39
when I run, i get the following error:
sqlalchemy.exc.ProgrammingError: (pyodbc.ProgrammingError) ('42S02', '[42S02] [IBM][System i Access ODBC Driver][DB2 for i5/OS]SQL0204 - COLUMNS in SYSCAT type *FILE not found. (-204) (SQLPrepare)')
[SQL: SELECT "SYSCAT"."COLUMNS"."COLNAME", "SYSCAT"."COLUMNS"."TYPENAME", "SYSCAT"."COLUMNS"."DEFAULT", "SYSCAT"."COLUMNS"."NULLS", "SYSCAT"."COLUMNS"."LENGTH", "SYSCAT"."COLUMNS"."SCALE", "SYSCAT"."COLUMNS"."IDENTITY", "SYSCAT"."COLUMNS"."GENERATED"
FROM "SYSCAT"."COLUMNS"
WHERE "SYSCAT"."COLUMNS"."TABSCHEMA" = ? AND "SYSCAT"."COLUMNS"."TABNAME" = ? ORDER BY "SYSCAT"."COLUMNS"."COLNO"]
[parameters: ('USER', 'TABLETEST')]
(Background on this error at: https://sqlalche.me/e/14/f405)
I think, the dialect could be wrong, because the parameters are the username and the table for the ODBC connection?
AND: I am not really sure, whats the difference between ibm_db_sa and ibm_db?
I tried a few days again, before someone is trying to do this via sqlalchemy should do it via pyodbc.
Here is my working example refering the df_to_sql_bulk_insert function to this (and now I am currently using my system-DSN):
def df_to_sql_bulk_insert(df: pd.DataFrame, table: str, **kwargs) -> str:
df = df.copy().assign(**kwargs)
columns = ", ".join(df.columns)
tuples = map(str, df.itertuples(index=False, name=None))
values = re.sub(r"(?<=\W)(nan|None)(?=\W)", "NULL", (",\n" + " " * 7).join(tuples))
return f"INSERT INTO {table} ({columns})\nVALUES {values}"
cnxn = pyodbc.connect("DSN=XXX")
cursor = cnxn.cursor()
sqlstr = df_to_sql_bulk_insert(df,"DBXXX.TBLXXX")
cursor.execute(sqlstr)
cnxn.commit()