Search code examples
pythonapache-sparkpysparkazure-databricks

how to Perform Divide operations in Pyspark while updating the column?


I have a SQL Query which I want to change into pyspark. we have a Column called CGPA which contains some values and i need to divide the values by 25 where our conditions are matching

UPDATE STUDENT_TABLE 
SET STUDENT_TABLE.CGPA = ([STUDENT_TABLE].CGPA/25)
WHERE (((STUDENT_TABLE.CGPA)>"5" Or (STUDENT_TABLE.CGPA)="100"));

The Pyspark Code i am trying is not working

df=df.withColumn('CGPA' , F.when((col('CGPA') > '5') & (col('CGPA') == '100'),(df.CGPA/25) .otherwise(df['CGPA])

Solution

  • I have taken the following sample data for my dataframe to demonstrate.

    data = [[1,4],[2,10],[3,40],[4,100],[5,0],[6,5]]
    df = spark.createDataFrame(data=data,schema=['id','CGPA'])
    display(df)
    

    enter image description here

    • Looking at your SQL query, as you want to update when either of the conditions is true (i.e., or condition in SQL UPDATE), you can use the following code:
    from pyspark.sql.functions import col,when
    
    df.withColumn('CGPA' , when((col('CGPA') > '5') | (col('CGPA') == '100'),(df.CGPA/25)).otherwise(df['CGPA'])).show()
    
    #df = df.withColumn('CGPA' , when((col('CGPA') > '5') | (col('CGPA') == '100'),(df.CGPA/25)).otherwise(df['CGPA']))
    #df.show()
    

    enter image description here

    If condition is AND, then use this code:

    from pyspark.sql.functions import col,when
    
    df = df.withColumn('CGPA' , when((col('CGPA') > '5') & (col('CGPA') == '100'),(df.CGPA/25)).otherwise(df['CGPA']))
    df.show()
    

    enter image description here