Search code examples
t-sqlpysparkapache-spark-sqlcase-when.when

Multiple WHEN condition implementation in Pyspark


I've my T-SQL code below which I've converted in Pyspark but is giving me error

CASE
            WHEN time_on_site.eventaction = 'IN' AND time_on_site.next_action = 'OUT' AND time_on_site.timespent_sec < 72000 THEN 1  --  20 hours 
            WHEN time_on_site.eventaction = 'IN' AND time_on_site.next_action = 'OUT' AND time_on_site.timespent_sec >= 72000 THEN 0
            WHEN time_on_site.eventaction = 'IN' AND time_on_site.next_action = 'IN' AND time_on_site.timespent_sec <= 28800 THEN 2  -- 8 hours
            WHEN time_on_site.eventaction = 'IN' AND time_on_site.next_action = 'IN' AND time_on_site.timespent_sec > 28800 THEN 3
            WHEN time_on_site.type_flag = 'TYPE4' THEN 4
            ELSE NULL
         END AS "type"

Below is my Pyspark script which is throwing an error

from pyspark.sql.functions import when

TOS=TOS.withColumn('type', F.when( (col('eventaction') == 'IN') & (col('next_action') == 'OUT') & ("timespent_sec < 72000") , 1).
                            when( (col('eventaction') == 'IN') & (col('next_action') == 'OUT') & ("timespent_sec >= 72000") , 0).
                            when( (col('eventaction') == 'IN') & (col('next_action') == 'IN') & ("timespent_sec <= 28800") , 2).
                            when( (col('eventaction') == 'IN') & (col('next_action') == 'IN') & ("timespent_sec > 28800") , 3).
                            when(col('type_flag')=='TYPE4', 4).otherwise('NULL')
                            )

Where am I going wrong!?


Solution

  • Nest your 'when' in otherwise(). My braces may not be entirely balanced, so do check them, but the idea is the same.

    TOS=TOS.withColumn('type', F.when( (col('eventaction') == 'IN') & (col('next_action') == 'OUT') & ("timespent_sec < 72000") , 1).
                                otherwise( when(   (col('eventaction') == 'IN') & (col('next_action') == 'OUT') & ("timespent_sec >= 72000") , 0).
                                otherwise( when(   (col('eventaction') == 'IN') & (col('next_action') == 'IN') & ("timespent_sec <= 28800") , 2).
                                otherwise( when(   (col('eventaction') == 'IN') & (col('next_action') == 'IN') & ("timespent_sec > 28800") , 3).
                                otherwise( when(   col('type_flag')=='TYPE4', 4).otherwise('NULL'))))))