Search code examples
pythonpandasduckdb

How to updated a table (accessed in pandas) in DuckDB Database?


I'm working on one of use case, I have a larger volumes of records created in a duckdb database table, these tables can be accessed in pandas dataframe, do the data manipulations and send them back to DB table. here below I will explain my case.

  1. I have a DB called as MY_DB in Duck DB and a table in it as ROLL_TABLE_A, here it will be queried and converted as a Pandas data frame as _DF.

  2. The same table (ROLL_TABLE_A) can be accessed by multiple users and do the required updates on the data frame _DF.

  3. How to upload the data frame _DF into the same table ROLL_TABLE_A?.

Steps to reproduce:

# Connection and cursor creations
dbas_db_con = duckdb.connect('MY_DB.db')
# list of DB TABLE
dbas_db_con.execute("SHOW TABLES").df()

enter image description here

# Query on DB Table
dbas_db_con.execute("SELECT *FROM ROLL_TABLE_A").df()

enter image description here

# convert database table to pandas table
_df = dbas_db_con.execute("SELECT *FROM ROLL_TABLE_A").df()

Here on _df id field is filled up with multiple user and after updating pandas dataframe it will be as.

enter image description here

Here the updated dataframe to be updated in ROLL_TABLE_A Table in DuckDB.

dbas_db_con.execute("SELECT *FROM ROLL_TABLE_A").df()

On accessing a ROLL_TABLE_A it will produce an output as

enter image description here


Solution

  • Here is a function that takes a dataframe, table name and database path as input and writes the dataframe to the table:

    def df_to_duckdb(df:pd.DataFrame, table:str, db_path:str):
        con = duckdb.connect(database=db_path, read_only=False)
        
        # register the df in the database so it can be queried
        con.register("df", df)
        
        query = f"create or replace table {table} as select * from df"
        con.execute(f"{query}")
    
        con.close()
    

    The part that took me a while to figure out is registering the df as a relation (e.g. table/view) in the database. Registering does not write the df to the database, but is essentially a pointer within the database that references the df in memory.