Search code examples
sqlalchemyibm-midrange

df.to_sql with AS400


i want to put a Panda Dataframe to a IBM i Series / AS400. I already researched a much, but now I am stuck. I already made a lot of queries, where I use pyodbc. For df.to_sql() I should use, as readed on other stacks, sqlalchemy with the ibm_db_sa dialect.

My actual code is:

    CONNECTION_STRING = (
        "driver={iSeries Access ODBC Driver};"
        "System=111.111.111.111;"
        "database=TESTDB;"
        "uid=USER;"
        "pwd=PASSW;"
    )


    quoted = urllib.parse.quote_plus(CONNECTION_STRING)
    engine = create_engine('ibm_db_sa+pyodbc:///?odbc_connect={}'.format(quoted))


    create_statement = df.to_sql("TABLETEST", engine, if_exists="append")

the following packages are installed

python 3.9
ibm-db 3.1.3
ibm-db-sa 0.3.7
ibm-db-sa-py3 0.3.1.post1
pandas 1.3.5
pip 22.0.4
setuptools 57.0.0
SQLAlchemy 1.4.39

when I run, i get the following error:

sqlalchemy.exc.ProgrammingError: (pyodbc.ProgrammingError) ('42S02', '[42S02] [IBM][System i Access ODBC Driver][DB2 for i5/OS]SQL0204 - COLUMNS in SYSCAT type *FILE not found. (-204) (SQLPrepare)')
[SQL: SELECT "SYSCAT"."COLUMNS"."COLNAME", "SYSCAT"."COLUMNS"."TYPENAME", "SYSCAT"."COLUMNS"."DEFAULT", "SYSCAT"."COLUMNS"."NULLS", "SYSCAT"."COLUMNS"."LENGTH", "SYSCAT"."COLUMNS"."SCALE", "SYSCAT"."COLUMNS"."IDENTITY", "SYSCAT"."COLUMNS"."GENERATED" 
FROM "SYSCAT"."COLUMNS" 
WHERE "SYSCAT"."COLUMNS"."TABSCHEMA" = ? AND "SYSCAT"."COLUMNS"."TABNAME" = ? ORDER BY "SYSCAT"."COLUMNS"."COLNO"]
[parameters: ('USER', 'TABLETEST')]
(Background on this error at: https://sqlalche.me/e/14/f405)

I think, the dialect could be wrong, because the parameters are the username and the table for the ODBC connection?

AND: I am not really sure, whats the difference between ibm_db_sa and ibm_db?


Solution

  • I tried a few days again, before someone is trying to do this via sqlalchemy should do it via pyodbc.

    Here is my working example refering the df_to_sql_bulk_insert function to this (and now I am currently using my system-DSN):

      def df_to_sql_bulk_insert(df: pd.DataFrame, table: str, **kwargs) -> str:
          df = df.copy().assign(**kwargs)
          columns = ", ".join(df.columns)
          tuples = map(str, df.itertuples(index=False, name=None))
          values = re.sub(r"(?<=\W)(nan|None)(?=\W)", "NULL", (",\n" + " " * 7).join(tuples))
          return f"INSERT INTO {table} ({columns})\nVALUES {values}"
    
    
      cnxn = pyodbc.connect("DSN=XXX")
      cursor = cnxn.cursor()
      sqlstr = df_to_sql_bulk_insert(df,"DBXXX.TBLXXX")
      cursor.execute(sqlstr)
      cnxn.commit()