Search code examples
databricksdelta-lakedatabricks-community-edition

Generated/Default value in Delta table


I'm trying to set default values to column in Delta Lake table, for example:

CREATE TABLE delta.dummy_7 (id INT, yes BOOLEAN, name STRING, sys_date DATE GENERATED ALWAYS AS CAST('2022-01-01' AS DATE), sys_time TIMESTAMP) USING DELTA;
Error in query:
Syntax error at or near 'GENERATED'(line 1, pos 76)

I have tried in SPARK-SQL + Delta Core library: enter image description here

And basically same error using Hive JDBC + Thrift service + Delta Sharing enter image description here

I'm able to create Delta tables without Generated/Default, for example:

CREATE TABLE delta.dummy_6 (id INT, yes BOOLEAN, name STRING, sys_date DATE, sys_time TIMESTAMP) USING DELTA;
spark-sql> DESCRIBE delta.dummy_6;
id                      int
yes                     boolean
name                    string
sys_date                date
sys_time                timestamp

# Partitioning
Not partitioned
Time taken: 0.069 seconds, Fetched 8 row(s)
spark-sql> DESCRIBE HISTORY delta.dummy_6;
0       2022-09-20 16:59:37.689 NULL    NULL    CREATE TABLE    {"description":null,"isManaged":""}      NULL    NULL    NULL    NULL    Serializable    true    {}      NULL    Apache-Spark/3.3.
Time taken: 1.349 seconds, Fetched 1 row(s)

I don't find any documentation related to the Generated/Default specifications in Delta.io

but in Databricks it is documented.

Is it something wrong with my SQL command? or are Generated/Default specifications available in Databricks only?


Solution

  • As of today this is not supported in SQL - see this Github issue.

    You can create a table with generated column using Scala API:

    DeltaTable.createIfNotExists(spark)
      .tableName("delta.dummy_7")
      .addColumn("id", "INT")
      .addColumn("yes", "BOOLEAN")
      .addColumn("name", "STRING")
      .addColumn(DeltaTable.columnBuilder("sys_date")
                           .dataType("DATE")
                           .generatedAlwaysAs("CAST('2022-01-01' AS DATE)")
                           .build()
      )
      .addColumn("sys_time", "TIMESTAMP")
      .execute()