Search code examples
odbcpyodbcpython-polars

How to connect polars write_database with mssql+pyodbc?


Before polars>=0.16.10 I was using .to_pandas().to_sql() to send a polars dataframe to the database. Now it should be possible to use the wrapper pl.Dataframe.write_database(), however, altough my connection_uri is working for pl.read_database() and pl.read_sql(). It doesn't work for pl.Dataframe.write_database().

This works:

connection_uri = "mssql+pyodbc://username:password@server:port/database"
pl.read_database(query, connection_uri)

# or

pl.read_sql(query, connection_uri)

This doesnt:

df = pl.Dataframe({...})

df.write_database( 
    table_name='dbo.Mytable',
    connection_uri="mssql+pyodbc://username:password@server:port/database",
    if_exists="append",
    engine="sqlalchemy"
    )

»»» sqlalchemy.exc.InterfaceError: (pyodbc.InterfaceError) ('IM002', '[IM002] [Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified (0) (SQLDriverConnect)')

So far I kept using the old method .to_pandas().to_sql(), now with the added .to_pandas(use_pyarrow_extension_array=True).to_sql() after the latest pandas update.

But I find it quite strange that the same connection_uri string works for read_database but not for write_database.


Solution

  • When you use to_pandas().to_sql() what parameters are you putting into the to_sql()?

    From the docs, it would seem you must be feeding it a sqlalchemy engine. Next question, how are you initiating the engine? Is it with the same connection string or is it a different syntax?

    The source code of the pl.DataFrame.write_database method (the relevant part anyway) is this:

    try:
        from sqlalchemy import create_engine
    except ImportError as exc:
        raise ImportError(
            "'sqlalchemy' not found. Install polars with 'pip install polars[sqlalchemy]'."
        ) from exc
    
    engine = create_engine(connection_uri)
    
    # this conversion to pandas as zero-copy
    # so we can utilize their sql utils for free
    self.to_pandas(use_pyarrow_extension_array=True).to_sql(
        name=table_name, con=engine, if_exists=if_exists, index=False
    )
    

    So whatever connection string you're using to create the sqlalchemy engine you're using for the pandas to_sql must work for the polars write_database because all it does is use the pandas to_sql.