Search code examples
parquetduckdb

Duck DB Not implemented Error: Writing to HTTP files not implemented


Using duck db, I am trying to write a data frame (from my VS code) into a parquet (in Azure storage accounts). I am getting the error as Not implemented Error: Writing to HTTP files not implemented.

However, while forming the data frame (which I am forming for the data in a csv file kept in the Azure storage accounts blob container), it is working well to read the csv file, in my VS code.

azure_storage_path= 'https://somename.blob.core.windows.net/the-conatiner-name'
table_name='https://somename.blob.core.windows.net/the-conatiner-name/the_csv.csv'

conn = duckdb.connect()
conn.execute('install httpfs') 
conn.execute('load httpfs')
            df = conn.execute("""
                              CREATE OR REPLACE TABLE some_table AS
                                SELECT *
                                FROM '"""+table_name+"""'
                                LIMIT 10
                             """).df()
##Error occurs in below line##
conn.execute("COPY (FROM some_table) TO '"+azure_storage_path+"/ParquetFile.parquet' (FORMAT 'parquet')")    

My target is to form the csv as a parquet in Azure container


Solution

  • The error is correct, HTTP doesn't really do a good job of abstracting filesystems (nor is it designed to).

    Instead, you can use the fsspec support (which, full disclosure, I added)

    import duckdb
    from fsspec import filesystem
    
    # this line will throw an exception if the appropriate filesystem interface is not installed
    duckdb.register_filesystem(filesystem('abfs', account_name=ACCOUNT_NAME, account_key=ACCOUNT_KEY))
    
    duckdb.execute("COPY (FROM some_table) TO 'abfs://the-container-name/ParquetFile.parquet' (FORMAT 'parquet')")