Search code examples
pythonuripython-polarsduckdb

How to write a polars dataframe to DuckDB


I am trying to write a Polars DataFrame to a duckdb database. I have the following simple code which I expected to work:

import polars as pl
import duckdb

pldf = pl.DataFrame({'mynum': [1,2,3,4]})
with duckdb.connect(database="scratch.db", read_only=False) as con:
    pldf.write_database(table_name='test_table', connection=con)

However, I get the following error:

sqlalchemy.exc.ArgumentError: Expected string or URL object, got <duckdb.duckdb.DuckDBPyConnection object

I get a similar error if I use the non-default engine='adbc' instead of df.write_database()'s default engine='sqlalchemy'.

So it seemed it should be easy enough to just swap in a URI for my ducdkb database, but I haven't been able to get that to work either. Potentially it's complicated by my being on Windows?


Solution

  • In-memory database. If you just want to use DuckDB to query a polars dataframe, this can simply be achieved as long as the table exists in the current scope.

    duckdb.sql("SELECT * FROM df").show()
    

    Persistent database If you want to use a persistent database, you could install duckdb-engine and write the database using the connection URI string.

    df.write_database(
        table_name='test_table',
        connection="duckdb:///scratch.db",
    )
    

    Reading the data back in using DuckDB works as usual.

    with duckdb.connect(database="scratch.db", read_only=False) as con:
        con.query("SELECT * FROM test_table").show()
    
    ┌───────┐
    │ mynum │
    │ int64 │
    ├───────┤
    │     1 │
    │     2 │
    │     3 │
    │     4 │
    └───────┘