how to update rows in data frame in pyspark , the table is in serverless SQL pool in Azure Synapse

I am trying to update few rows in dataframe using pyspark , the table exist in Serverless SQL pool

I did try yo use Update command but the error is it is not supported in serverless SQL pool, is there any way possible update few row in Azure synapse Serverless SQL Table

Solution

If you are updating tables which are created using ADLS data source in serverless pool as per this Serverless SQL pools don't support updating Delta Lake files. You can use serverless SQL pool to query the latest version of Delta Lake. Use Apache Spark pools in Synapse Analytics to update Delta Lake.

Read the data from path with below code:

df = spark.read.load('abfss://<containerName>@<ADLSName>.dfs.core.windows.net/mycsv2.csv', format='csv',header=True)

enter image description here

Update the data according to the requirements using below code:

from pyspark.sql.functions import when, col
df = df.withColumn('<colName>', when(<condition>, '<updateValue>').otherwise(df.<colName>))

Create a linked service of ADLS. And write the data frame into ADLS with below code:

    import pandas as pd
    pdf = df.toPandas()   
    pdf.to_csv('abfss://<filepath>/mycsv2.csv', storage_options = {'linked_service' : '<linkedserviceName>'})
    print("done")

complete code:

from pyspark.sql.functions import when, col
import pandas as pd

df = spark.read.load('abfss://rakeshsynapse@rakeshgen2.dfs.core.windows.net/mycsv2.csv', format='csv',header=True)
df = df.withColumn('Name', when(col('Id') == 1, 'AE').otherwise(df.Name))
pdf = df.toPandas()
pdf.to_csv('abfss://rakeshsynapse/mycsv2.csv', storage_options = {'linked_service' : 'AzureDataLakeStorage1'})
print("done")

enter image description here

The data will be update successfully.