Search code examples
azure-databricksdelta-lake

Detecting CSV Headers when creating a DataBricks Delta Table?


Needless to say, I'm new to Spark DataBricks and Delta.

I'm trying to create a Delta table using %sql from a simple csv where the first row is a header row. Unfortunately I can't seem to get the initial CREATE TABLE to recognise the header column in the CSV (Just to note, I've been using the DataBricks quickstart as a guide - https://docs.databricks.com/delta/quick-start.html)

The code I've got in my Databricks notebook is

%sql
CREATE TABLE people
USING delta
LOCATION '/dbfs/mnt/mntdata/DimTransform/People.csv'

I have tried using the TBLPROPERTIES ("headers" = "true") but with no success - see below

%sql
CREATE TABLE people
USING delta
TBLPROPERTIES ("headers" = "true")
AS SELECT *
FROM csv.'/mnt/mntdata/DimTransform/People.csv'

In both cases, the csv data is loaded into the table but the header row is just included in the data as the first standard row.

Any ideas how I'd get this %sql CREATE TABLE to recognise the first/header row as a header when loading from a csv?

Thanks


Solution

  • Maybe you have to do a small workaround because you are using CSV file, not a JSON or PARQUET, These files have schema and csv no.

    So I suggest to do that::

    %sql
    drop table if exists tempPeopleTable ;
    CREATE TABLE tempPeopleTable
      USING csv
      OPTIONS (path "/mnt/mntdata/DimTransform/People.csv", header "true", inferSchema "true");
    
    CREATE TABLE people
    USING delta
    AS SELECT * FROM tempPeopleTable;
    
    drop table if exists tempPeopleTable;