Search code examples
pythonpython-3.xpandasoracle-databasesqlalchemy

pandas df.to_sql to Oracle database datatype inconsistency


Problem

I have a pandas dataframe and I'm trying to use the pd.df.to_sql() function to an Oracle database. My Oracle database is 19.3c. Seems easy enough right? Why won't it work??

I saw in a few other another stackoverflow posts that I should be using sqlalchemy datatypes. Okay. Links:

from sqlalchemy.types import Integer, String
from sqlalchemy.dialects.oracle import NUMBER, VARCHAR2, DATE

oracle_dtypes = {
        'id' : NUMBER(38,0),
        'counts' : Integer,
        'name' : VARCHAR2(50),
        'swear_words' : String(9999)
        'date' : DATE()   
}

df_upload.to_sql(
    "oracle_table", 
    db.engine, 
    schema="SA_COVID", 
    if_exists="replace", 
    index=False
    dtype=oracle_dtypes
)

It never fails to convert random groups to CLOB or some other random datatypes. What should I do?

Things i've tried

I've tried and didn't work...

  • trucating (sending a SQL statement to the db from python) the table first then if_exist="append"
  • using the if_exist="replace"
  • using Oracle specific dialect of sqlalchemy datatypes only
  • using the generic sqlalchmey datatypes only
  • using a mix of both just bc I'm frustrated

Maybe it's an Oracle specific issue?

Things I haven't tried:

Things I haven't tried:

  • Dropping table and just recreating it before insert
  • to_sql adhoc and the send a series of some ALTER TABLE tbl_name MODIFY col_name

Related Links:


Solution

  • Turns out I needed to double check the incoming datatypes from the API into my pandas dataframe (made a dumb assumption the data was clean)... The API was yielding all strings, and using df.info really helped.

    Needed to convert all the the integer, numeric, and dates to the appropriate datatypes in python (that was the main issue), and from there could re-map the the database datatypes. In short...

    API (all strings) --> Python (set datatypes) --> Database (map datatypes using sqlalchemy)

    I used the pd.Int64Dtype() for integer columns with null values, and 'datetime64[ns]' for datetimes.