Search code examples
azureazure-synapse

Azure Synapse Analytics: Parquet Create TempView and change column values


I am hoping if someone can help here. Below is part of script our developer used in Azure Synapse to read and create TempView from parquet file. It reads parquet file "Client" and create a TempView: aa_client:

aa_client = read_from_data_lake(aa_account_name, aa_file_system_name, aa_file_location + "Client", aa_file_format, aa_header, aa_infer_schema)
aa_client.createOrReplaceTempView("aa_client")
aa_client = aa_client.withColumn("client_group",when(aa_client.client_group=="UGR","UG"))
aa_client = aa_client.withColumn("last_update_timestamp",lit(None))

The first 2 lines of codes are working, but I was just wondering if I could also add the following:

  1. change value in column "client_group" on condition: if value is "UGR" change to "UG"
  2. change the datetime in "last_update_timestamp" to none I tried to do this using withColumn (last two line of codes) but they are not working at all. I am hoping if someone can help please?

Many Thanks Michelle


Solution

  • I reproduced the above and able to update the dataframe.

    This is my sample data in dataframe df:

    enter image description here

    I used the below code, and you can see the values are updated in the result dataframe.

    aa_client = df.withColumn("client_group",when(df.client_group=="UGR","UG").otherwise(df["client_group"]))
    
    aa_client = aa_client.withColumn("last_update_timestamp",lit(None))
    

    I have used otherwise after when in the first line to not update other values to null if the condition is not matched.

    Result:

    enter image description here

    If you want to update your temporary view, you have create or Replace the temporary view again after the above operations. dataframe operations won't update the views.