I have a SQL query that I am trying to convert into pyspark the query looks like this
UPDATE EMPLOYEE_TABLE SET A_CONVERSION_TEMPLATE.[STATUS] = "OTHER/MENTIONED" WHERE (((EMPLOYEE_TABLE.[HOLIDAY_TAKEN]) Is Null));
Second Query:
UPDATE EMPLOYEE_TABLE SET EMPLOYEE_TABLE.[STATUS] = "OTHER/MENTIONED" WHERE (((EMPLOYEE_TABLE.STATUS) Is Null));
Third Query with two conditions
UPDATE EMPLOYEE_TABLE SET EMPLOYEE_TABLE.[LANGUAGE] = "ENGLISH",
WHERE (((EMPLOYEE_TABLE.LANGUAGE[])<>"ENGLISH") AND ((EMPLOYEE_TABLE.FLAGYESNO)="Y"));
The Pyspark queries I am trying are I have combined the first and second queries because in both we have to update the same column. The query I am using is
df=df.withColumn('STATUS', F.when( (col('STATUS')==''),'OTHER/MENTIONED').
otherwise(when( (col('HOLIDAY_TAKEN]')==''), 'OTHER/MENTIONED').
otherwise('NULL')))
When I am running my df it is updating everything to null in my 'STATUS' Column even the values which were already present it is changing it to 'NULL'
For the third query, I am getting no idea. Please let me know if need other information. Attaching the Output for more understanding enter image description here Thanks in advance for your responses
Create a data frame as follows:
+----------+----------+-------------+
|DEPARTMENT| STATUS|HOLIDAY_TAKEN|
+----------+----------+-------------+
| HR| null| TAKEN|
| MANAGER| null| TAKEN|
| HR| WORKING| ONLEAVE|
| ENGINEER|ONCONTRACT| SICKLEAVE|
| PA| null| null|
| CEO| ON VISA| null|
| ENGINEER| null| null|
| MANAGER| null| null|
+----------+----------+-------------+
Then write a condition satisfying both the conditions in a single query:
from pyspark.sql import functions as fx
from pyspark.sql.functions import *
expression = """CASE
WHEN (HOLIDAY_TAKEN IS NULL AND STATUS IS NULL) OR (HOLIDAY_TAKEN IS NOT NULL AND STATUS IS NULL) THEN 'OTHER/MENTIONED'
ELSE STATUS
END"""
df2 = df.withColumn('STATUS',fx.expr(expression))
df2.show()