I'm trying to set default values to column in Delta Lake table, for example:
CREATE TABLE delta.dummy_7 (id INT, yes BOOLEAN, name STRING, sys_date DATE GENERATED ALWAYS AS CAST('2022-01-01' AS DATE), sys_time TIMESTAMP) USING DELTA;
Error in query:
Syntax error at or near 'GENERATED'(line 1, pos 76)
I have tried in SPARK-SQL + Delta Core library:
And basically same error using Hive JDBC + Thrift service + Delta Sharing
I'm able to create Delta tables without Generated/Default, for example:
CREATE TABLE delta.dummy_6 (id INT, yes BOOLEAN, name STRING, sys_date DATE, sys_time TIMESTAMP) USING DELTA;
spark-sql> DESCRIBE delta.dummy_6;
id int
yes boolean
name string
sys_date date
sys_time timestamp
# Partitioning
Not partitioned
Time taken: 0.069 seconds, Fetched 8 row(s)
spark-sql> DESCRIBE HISTORY delta.dummy_6;
0 2022-09-20 16:59:37.689 NULL NULL CREATE TABLE {"description":null,"isManaged":""} NULL NULL NULL NULL Serializable true {} NULL Apache-Spark/3.3.
Time taken: 1.349 seconds, Fetched 1 row(s)
I don't find any documentation related to the Generated/Default specifications in Delta.io
but in Databricks it is documented.
Is it something wrong with my SQL command? or are Generated/Default specifications available in Databricks only?
As of today this is not supported in SQL - see this Github issue.
You can create a table with generated column using Scala API:
DeltaTable.createIfNotExists(spark)
.tableName("delta.dummy_7")
.addColumn("id", "INT")
.addColumn("yes", "BOOLEAN")
.addColumn("name", "STRING")
.addColumn(DeltaTable.columnBuilder("sys_date")
.dataType("DATE")
.generatedAlwaysAs("CAST('2022-01-01' AS DATE)")
.build()
)
.addColumn("sys_time", "TIMESTAMP")
.execute()