I am trying to update few rows in dataframe using pyspark , the table exist in Serverless SQL pool
I did try yo use Update command but the error is it is not supported in serverless SQL pool, is there any way possible update few row in Azure synapse Serverless SQL Table
If you are updating tables which are created using ADLS data source in serverless pool as per this Serverless SQL pools don't support updating Delta Lake files. You can use serverless SQL pool to query the latest version of Delta Lake. Use Apache Spark pools in Synapse Analytics to update Delta Lake.
Read the data from path with below code:
df = spark.read.load('abfss://<containerName>@<ADLSName>.dfs.core.windows.net/mycsv2.csv', format='csv',header=True)
Update the data according to the requirements using below code:
from pyspark.sql.functions import when, col
df = df.withColumn('<colName>', when(<condition>, '<updateValue>').otherwise(df.<colName>))
Create a linked service of ADLS. And write the data frame into ADLS with below code:
import pandas as pd
pdf = df.toPandas()
pdf.to_csv('abfss://<filepath>/mycsv2.csv', storage_options = {'linked_service' : '<linkedserviceName>'})
print("done")
complete code:
from pyspark.sql.functions import when, col
import pandas as pd
df = spark.read.load('abfss://rakeshsynapse@rakeshgen2.dfs.core.windows.net/mycsv2.csv', format='csv',header=True)
df = df.withColumn('Name', when(col('Id') == 1, 'AE').otherwise(df.Name))
pdf = df.toPandas()
pdf.to_csv('abfss://rakeshsynapse/mycsv2.csv', storage_options = {'linked_service' : 'AzureDataLakeStorage1'})
print("done")
The data will be update successfully.