I have a DuckDB table whose column names have white spaces, and I'd like to just specify a blanket rule that says "for all columns with spaces, replace it with an underscore". I know how to do this by converting the table to a Polars DataFrame, but can I do it without the conversion, preferably using DuckDB's relational API?
test.csv
"id","company name"
1,"Walmart"
2,"Amazon"
3,"Apple"
df = duckdb.sql("SELECT * FROM test.csv")
This yields the company name
column name with spaces.
┌───────┬──────────────┐
│ id │ company name │
│ int64 │ varchar │
├───────┼──────────────┤
│ 1 │ Walmart │
│ 2 │ Amazon │
│ 3 │ Apple │
└───────┴──────────────┘
df = duckdb.sql("SELECT * FROM 'test.csv'").pl()
# Replace spaces with underscores in column names using Polars
df.columns = list(map(lambda x: x.replace(" ", "_"), df.columns))
df_new = duckdb.sql("SELECT * from df")
┌───────┬──────────────┐
│ id │ company_name │
│ int64 │ varchar │
├───────┼──────────────┤
│ 1 │ Walmart │
│ 2 │ Amazon │
│ 3 │ Apple │
└───────┴──────────────┘
I'm unable to find any examples on how to do this directly, without conversion to polars in DuckDB's Python client API. The docs show how to use ALTER TABLE
via raw SQL), but my goal is to do this via Python.
If you are ok with normalizing all non-alphanumeric characters in your column names, you can use the normalize_names
parameter of the *_csv_auto
functions
The python function from_csv_auto
df = duckdb.from_csv_auto('test.csv', normalize_names=True)
The sql-like read_csv_auto
df = duckdb.sql("select * from read_csv_auto('test.csv', normalize_names=True)")
This will replace the spaces with underscores.
┌───────┬──────────────┐
│ id │ company_name │
│ int64 │ varchar │
├───────┼──────────────┤
│ 1 │ Walmart │
│ 2 │ Amazon │
│ 3 │ Apple │
└───────┴──────────────┘