DataBricks: Ingesting CSV data to a Delta Live Table in Python triggers "invalid characters in table name" error - how to set column mapping mode?

First off, can I just say that I am learning DataBricks at the time of writing this post, so I'd like simpler, cruder solutions as well as more sophisticated ones.

I am reading a CSV file like this:

df1 = spark.read.format("csv").option("header", True).load(path_to_csv_file)

Then I'm saving it as a Delta Live Table like this:

df1.write.format("delta").save("table_path")

The CSV headers have characters in them like space and & and /, and I get the error:

AnalysisException: Found invalid character(s) among " ,;{}()\n\t=" in the column names of your schema. Please enable column mapping by setting table property 'delta.columnMapping.mode' to 'name'. For more details, refer to https://docs.databricks.com/delta/delta-column-mapping.html Or you can use alias to rename it.

The documentation I've seen on the issue explains how to set the column mapping mode to 'name' AFTER a table has been created using ALTER TABLE, but does not explain how to set it at creation time, especially when using the DataFrame API as above. Is there a way to do this?

Is there a better way to get CSV into a new table?

UPDATE:

Reading the docs here and here, and inspired by Robert's answer, I tried this first:

spark.conf.set("spark.databricks.delta.defaults.columnMapping.mode", "name")

Still no luck, I get the same error. It's interesting how hard it is for a beginner to write a CSV file with spaces in its headers to a Delta Live Table

Solution

Thanks to Hemant on the Databricks community forum, I have found the answer.

df1.write.format("delta").option("delta.columnMapping.mode", "name")
                         .option("path", "table_path").saveAsTable("new_table")

Now I can either query it with SQL or load it into a Spark dataframe:

SELECT * FROM new_table;

delta_df = spark.read.format("delta").load("table_path")
display(delta_df)

SQL Way

This method does the same thing but in SQL.

First, create a CSV-backed table for your CSV file:

CREATE TABLE table_csv
  USING CSV
  OPTIONS (path '/path/to/file.csv', 'header' 'true', 'mode' 'FAILFAST');

Then create a Delta table using the CSV-backed table:

CREATE TABLE delta_table
  USING DELTA
  TBLPROPERTIES ("delta.columnMapping.mode" = "name")
  AS SELECT * FROM table_csv;

SELECT * FROM delta_table;

I've verified that I get the same error as I did when using Python should I omit the TBLPROPERTIES statement.

I guess the Python answer would be to use spark.sql and run this using Python, that way I could embed the CSV path variable in the SQL.