Search code examples
azureapache-sparkapache-spark-sqlazure-synapse-analytics

UPDATE, DROP COLUMN and EXCEPT not working in Spark SQL


I'm working on Spark SQL through Azure Synapse Apache Spark Pool and when I use the UPDATE command the error I get is :

UPDATE destination only supports Delta Sources.

I tried dropping the column I wanted to update and then creating it from scratch but the DROP COLUMN command doesn't work either. The error I get is:

DROP COLUMN is only supported with v2 tables.

I also tried using the EXCEPT command to select all columns except the column I wanted to update. The command :

SELECT * EXCEPT (NAME) FROM dummytable1;

Even that produces an error :

Syntax error at or near 'NAME'(line 1, pos 17)

Is there some way to make any of these 3 commands work in Spark SQL through Azure Synapse Apache Spark Pool? If so, please let me know.


Solution

  • In Azure Synapse, when you use the UPDATE statement in SPARK SQL, it is designed to work with Delta Lake tables. If you try to use the UPDATE statement on a non-Delta table, you will get this error.

    As you asked for the way to use UPDATE with non-delta lake files You can use pyspark code instead As an example I have created a student_table

    enter image description here

    As you can see I have updated the record for Grade in the Below code:

    from pyspark.sql.functions import when
    studentsDF = studentsDF.withColumn("Grade",
    when(col("Name") == "Bob", "F Failed")
    .otherwise(col("Grade"))
    )
    studentsDF.show()
    

    enter image description here

    In Azure Synapse, the DROP COLUMN operation is supported only on v2 (version 2) tables. V2 tables are the newer version of tables in Synapse Analytics that offer improved performance and additional features compared to the original version.

    The below code in pyspark will help you DROP a column.

    selectedStudentsDF = selectedStudentsDF.drop("Age")
    selectedStudentsDF.show()
    

    enter image description here

    For Except this error occurs because the EXCEPT statement in SPARK SQL does not support the syntax SELECT * EXCEPT (column_name). The correct syntax for EXCEPT is SELECT * FROM table1 EXCEPT SELECT * FROM table2

    In pyspark you can use the Exceptall

    from pyspark.sql.functions import col
    allStudentsDF = studentsDF
    bobDF = studentsDF.filter(col("Name") == "Bob")
    selectedStudentsDF = allStudentsDF.exceptAll(bobDF)
    selectedStudentsDF.show()
    

    enter image description here